{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,28]],"date-time":"2024-07-28T05:17:35Z","timestamp":1722143855319},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"S10","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Identification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strategy may not always be useful as presence of a motif does not necessarily imply a regulatory role. Conversely, motif presence may not be required for a TF to regulate a set of genes. Therefore, it is imperative to include functional (biochemical and molecular) associations, such as those found in the biomedical literature, into algorithms for identification of putative regulatory TFs that might be explicitly or implicitly linked to the genes under investigation.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this study, we present a Latent Semantic Indexing (LSI) based text mining approach for identification and ranking of putative regulatory TFs from microarray derived differentially expressed genes (DEGs). Two LSI models were built using different term weighting schemes to devise pair-wise similarities between 21,027 mouse genes annotated in the Entrez Gene repository. Amongst these genes, 433 were designated TFs in the TRANSFAC database. The LSI derived TF-to-gene similarities were used to calculate TF literature enrichment p-values and rank the TFs for a given set of genes. We evaluated our approach using five different publicly available microarray datasets focusing on TFs <jats:italic>Rel<\/jats:italic>, <jats:italic>Stat6<\/jats:italic>, <jats:italic>Ddit3<\/jats:italic>, <jats:italic>Stat5<\/jats:italic> and <jats:italic>Nfic<\/jats:italic>. In addition, for each of the datasets, we constructed gold standard TFs known to be functionally relevant to the study in question. Receiver Operating Characteristics (ROC) curves showed that the log-entropy LSI model outperformed the <jats:italic>tf<\/jats:italic>-normal LSI model and a benchmark co-occurrence based method for four out of five datasets, as well as motif searching approaches, in identifying putative TFs.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Our results suggest that our LSI based text mining approach can complement existing approaches used in systems biology research to decipher gene regulatory networks by providing putative lists of ranked TFs that might be explicitly or implicitly associated with sets of DEGs derived from microarray experiments. In addition, unlike motif searching approaches, LSI based approaches can reveal TFs that may indirectly regulate genes.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-s10-s19","type":"journal-article","created":{"date-parts":[[2011,10,20]],"date-time":"2011-10-20T06:22:43Z","timestamp":1319091763000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets"],"prefix":"10.1186","volume":"12","author":[{"given":"Sujoy","family":"Roy","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin","family":"Heinrich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vinhthuy","family":"Phan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael W","family":"Berry","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ramin","family":"Homayouni","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2011,10,18]]},"reference":[{"key":"4859_CR1","doi-asserted-by":"publisher","first-page":"495","DOI":"10.1186\/1471-2105-9-495","volume":"9","author":"M Hestand","year":"2008","unstructured":"Hestand M, Galen VanM, Villerius M, et al.: CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes. BMC bioinformatics 2008, 9: 495. 10.1186\/1471-2105-9-495","journal-title":"BMC bioinformatics"},{"key":"4859_CR2","doi-asserted-by":"publisher","first-page":"W245","DOI":"10.1093\/nar\/gkm427","volume":"35","author":"SJ Ho Sui","year":"2007","unstructured":"Ho Sui SJ, Fulton DL, Arenillas DJ, Kwon AT, Wasserman WW: oPOSSUM: integrated tools for analysis of regulatory motif over-representation. Nucleic acids research 2007, 35: W245. 10.1093\/nar\/gkm427","journal-title":"Nucleic acids research"},{"key":"4859_CR3","doi-asserted-by":"publisher","first-page":"D108","DOI":"10.1093\/nar\/gkj143","volume":"34","author":"V Matys","year":"2006","unstructured":"Matys V, Kel-Margoulis OV, Fricke E, et al.: TRANSFAC\u00ae and its module TRANSCompel\u00ae: transcriptional gene regulation in eukaryotes. Nucleic acids research 2006, 34: D108. 10.1093\/nar\/gkj143","journal-title":"Nucleic acids research"},{"key":"4859_CR4","doi-asserted-by":"publisher","first-page":"2933","DOI":"10.1093\/bioinformatics\/bti473","volume":"21","author":"K Cartharius","year":"2005","unstructured":"Cartharius K, Frech K, Grote K, et al.: MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 2005, 21: 2933. 10.1093\/bioinformatics\/bti473","journal-title":"Bioinformatics"},{"key":"4859_CR5","doi-asserted-by":"publisher","first-page":"D102","DOI":"10.1093\/nar\/gkm955","volume":"36","author":"JC Bryne","year":"2008","unstructured":"Bryne JC, Valen E, Tang MHE, et al.: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic acids research 2008, 36: D102. 10.1093\/nar\/gkn449","journal-title":"Nucleic acids research"},{"key":"4859_CR6","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1016\/j.ceb.2006.04.002","volume":"18","author":"LO Barrera","year":"2006","unstructured":"Barrera LO, Ren B: The transcriptional regulatory code of eukaryotic cells-insights from genome-wide analysis of chromatin organization and transcription factor binding. Current opinion in cell biology 2006, 18: 291\u2013298. 10.1016\/j.ceb.2006.04.002","journal-title":"Current opinion in cell biology"},{"key":"4859_CR7","volume-title":"Wiley Interdisciplinary Reviews: Systems Biology and Medicine","author":"TM Kim","year":"2010","unstructured":"Kim TM, Park PJ: Advances in analysis of transcriptional regulatory networks. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 2010."},{"key":"4859_CR8","volume-title":"Modern information retrieval","author":"R Baeza-Yates","year":"1999","unstructured":"Baeza-Yates R, Ribeiro-Neto B: Modern information retrieval. Volume 463. ACM press New York; 1999."},{"key":"4859_CR9","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1353\/pbm.1986.0087","volume":"30","author":"DR Swanson","year":"1986","unstructured":"Swanson DR: Fish oil, Raynaud\u2019s syndrome, and undiscovered public knowledge. Perspectives in biology and medicine 1986, 30: 7.","journal-title":"Perspectives in biology and medicine"},{"key":"4859_CR10","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1186\/1471-2105-6-51","volume":"6","author":"B Alako","year":"2005","unstructured":"Alako B, Veldhoven A, Baal VanS, et al.: CoPub Mapper: mining MEDLINE based on search term co-publication. BMC bioinformatics 2005, 6: 51. 10.1186\/1471-2105-6-51","journal-title":"BMC bioinformatics"},{"key":"4859_CR11","first-page":"21","volume":"28","author":"TK Jenssen","year":"2001","unstructured":"Jenssen TK, L\u00e6greid A, Komorowski J, Hovig E: A literature network of human genes for high-throughput analysis of gene expression. Nature genetics 2001, 28: 21\u201328.","journal-title":"Nature genetics"},{"key":"4859_CR12","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1186\/1471-2105-5-147","volume":"5","author":"H Chen","year":"2004","unstructured":"Chen H, Sharp B: Content-rich biological network constructed by mining PubMed abstracts. BMC bioinformatics 2004, 5: 147. 10.1186\/1471-2105-5-147","journal-title":"BMC bioinformatics"},{"key":"4859_CR13","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1093\/bioinformatics\/btg421","volume":"20","author":"JD Wren","year":"2004","unstructured":"Wren JD, Bekeredjian R, Stewart JA, Shohet RV, Garner HR: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 2004, 20: 389. 10.1093\/bioinformatics\/btg421","journal-title":"Bioinformatics"},{"key":"4859_CR14","doi-asserted-by":"publisher","first-page":"1995","DOI":"10.1093\/bioinformatics\/btm261","volume":"23","author":"MF Burkart","year":"2007","unstructured":"Burkart MF, Wren JD, Herschkowitz JI, Perou CM, Garner HR: Clustering microarray-derived gene lists through implicit literature relationships. Bioinformatics 2007, 23: 1995. 10.1093\/bioinformatics\/btm261","journal-title":"Bioinformatics"},{"key":"4859_CR15","doi-asserted-by":"publisher","first-page":"W230","DOI":"10.1093\/nar\/gkh484","volume":"32","author":"H Pan","year":"2004","unstructured":"Pan H, Zuo L, Choudhary V, et al.: Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining. Nucleic acids research 2004, 32: W230. 10.1093\/nar\/gkh484","journal-title":"Nucleic acids research"},{"key":"4859_CR16","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1093\/bioinformatics\/bti597","volume":"22","author":"J \u0160aric","year":"2006","unstructured":"\u0160aric J, Jensen LJ, Ouzounova R, Rojas I, Bork P: Extraction of regulatory gene\/protein networks from Medline. Bioinformatics 2006, 22: 645. 10.1093\/bioinformatics\/bti597","journal-title":"Bioinformatics"},{"key":"4859_CR17","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1186\/1471-2105-8-293","volume":"8","author":"C Rodr\u00edguez-Penagos","year":"2007","unstructured":"Rodr\u00edguez-Penagos C, Salgado H, Mart\u00ednez-Flores I, Collado-Vides J: Automatic reconstruction of a bacterial regulatory network using Natural Language Processing. BMC bioinformatics 2007, 8: 293. 10.1186\/1471-2105-8-293","journal-title":"BMC bioinformatics"},{"key":"4859_CR18","doi-asserted-by":"publisher","first-page":"887","DOI":"10.1016\/j.jbi.2009.04.001","volume":"42","author":"H Yang","year":"2009","unstructured":"Yang H, Keane J, Bergman CM, Nenadic G: Assigning roles to protein mentions: The case of transcription factors. Journal of biomedical informatics 2009, 42: 887\u2013894. 10.1016\/j.jbi.2009.04.001","journal-title":"Journal of biomedical informatics"},{"key":"4859_CR19","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1093\/bioinformatics\/bth464","volume":"21","author":"R Homayouni","year":"2005","unstructured":"Homayouni R, Heinrich K, Wei L, Berry MW: Gene clustering by latent semantic indexing of MEDLINE abstracts. Bioinformatics 2005, 21: 104. 10.1093\/bioinformatics\/bth464","journal-title":"Bioinformatics"},{"key":"4859_CR20","volume-title":"Statistical data mining and knowledge discovery","author":"JT Giles","year":"2001","unstructured":"Giles JT, Wo L, Berry MW: GTP (General Text Parser) software for text mining. Statistical data mining and knowledge discovery 2001."},{"key":"4859_CR21","unstructured":"SMART stoplist[ftp:\/\/ftp.cs.cornell.edu\/pub\/smart\/english.stop]"},{"key":"4859_CR22","volume-title":"Soc for Industrial & Applied Math","author":"MW Berry","year":"2005","unstructured":"Berry MW, Browne M: Understanding search engines: mathematical modeling and text retrieval. Soc for Industrial & Applied Math 2005., 8:"},{"key":"4859_CR23","volume-title":"Numerical recipes in C: the art of scientific computing","author":"WH Press","year":"1992","unstructured":"Press WH: Numerical recipes in C: the art of scientific computing. Cambridge University Press; 1992."},{"key":"4859_CR24","doi-asserted-by":"publisher","first-page":"D885","DOI":"10.1093\/nar\/gkn764","volume":"37","author":"T Barrett","year":"2009","unstructured":"Barrett T, Troup DB, Wilhite SE, et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic acids research 2009, 37: D885. 10.1093\/nar\/gkn764","journal-title":"Nucleic acids research"},{"key":"4859_CR25","unstructured":"National Center for Biotechnology Information[http:\/\/www.ncbi.nlm.nih.gov\/]"},{"key":"4859_CR26","doi-asserted-by":"publisher","first-page":"31304","DOI":"10.1074\/jbc.M308975200","volume":"279","author":"LM Pfeffer","year":"2004","unstructured":"Pfeffer LM, Kim JG, Pfeffer SR, et al.: Role of nuclear factor-\u03baB in the antiviral action of interferon and interferon-regulated gene expression. Journal of Biological Chemistry 2004, 279: 31304. 10.1074\/jbc.M308975200","journal-title":"Journal of Biological Chemistry"},{"key":"4859_CR27","doi-asserted-by":"publisher","first-page":"3311","DOI":"10.1182\/blood-2010-02-271981","volume":"116","author":"S Huber","year":"2010","unstructured":"Huber S, Hoffmann R, Muskens F, Voehringer D: Alternatively activated macrophages inhibit T-cell proliferation by Stat6-dependent expression of PD-L2. Blood 2010, 116: 3311. 10.1182\/blood-2010-02-271981","journal-title":"Blood"},{"key":"4859_CR28","doi-asserted-by":"publisher","first-page":"3066","DOI":"10.1101\/gad.1250704","volume":"18","author":"SJ Marciniak","year":"2004","unstructured":"Marciniak SJ, Yun CY, Oyadomari S, et al.: CHOP induces death by promoting protein synthesis and oxidation in the stressed endoplasmic reticulum. Genes & development 2004, 18: 3066. 10.1101\/gad.1250704","journal-title":"Genes & development"},{"key":"4859_CR29","doi-asserted-by":"publisher","first-page":"1808","DOI":"10.1002\/hep.23882","volume":"52","author":"JH Yu","year":"2010","unstructured":"Yu JH, Zhu BM, Wickre M, et al.: The transcription factors signal transducer and activator of transcription 5A (STAT5A) and STAT5B negatively regulate cell proliferation through the activation of cyclin-dependent kinase inhibitor 2b (Cdkn2b) and Cdkn1a expression. Hepatology 2010, 52: 1808\u20131818. 10.1002\/hep.23882","journal-title":"Hepatology"},{"key":"4859_CR30","doi-asserted-by":"publisher","first-page":"6006","DOI":"10.1128\/MCB.01921-08","volume":"29","author":"G Plasari","year":"2009","unstructured":"Plasari G, Calabrese A, Dusserre Y, et al.: Nuclear Factor IC Links Platelet-Derived Growth Factor and Transforming Growth Factor \u03b21 Signaling to Skin Wound Healing Progression. Molecular and cellular biology 2009, 29: 6006. 10.1128\/MCB.01921-08","journal-title":"Molecular and cellular biology"},{"key":"4859_CR31","first-page":"28","volume-title":"Biometrika","author":"BL Welch","year":"1947","unstructured":"Welch BL: The generalization of student\u2019s problem when several different population variances are involved. Biometrika 1947, 28\u201335."},{"key":"4859_CR32","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1089\/jir.2007.0136","volume":"28","author":"L Wei","year":"2008","unstructured":"Wei L, Fan M, Xu L, et al.: Bioinformatic analysis reveals cRel as a regulator of a subset of interferon-stimulated genes. Journal of Interferon & Cytokine Research 2008, 28: 541\u2013552. 10.1089\/jir.2007.0136","journal-title":"Journal of Interferon & Cytokine Research"},{"key":"4859_CR33","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/S0001-2998(78)80014-2","volume":"8","author":"CE Metz","year":"1978","unstructured":"Metz CE: Basic principles of ROC analysis. Seminars in nuclear medicine 1978, 8: 283\u2013298. 10.1016\/S0001-2998(78)80014-2","journal-title":"Seminars in nuclear medicine"},{"key":"4859_CR34","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","volume":"143","author":"JA Hanley","year":"1982","unstructured":"Hanley JA, McNeal BJ: A simple generalization of the area under the ROC curve to multiple class classification problems. Radiology 1982, 143: 29\u201336.","journal-title":"Radiology"},{"key":"4859_CR35","doi-asserted-by":"publisher","first-page":"4046","DOI":"10.1093\/bioinformatics\/bti657","volume":"21","author":"JD Wren","year":"2005","unstructured":"Wren JD, Hildebrand WH, Chandrasekaran S, Melcher U: Markov model recognition and classification of DNA\/protein sequences within large text databases. Bioinformatics 2005, 21: 4046. 10.1093\/bioinformatics\/bti657","journal-title":"Bioinformatics"},{"key":"4859_CR36","doi-asserted-by":"publisher","first-page":"787","DOI":"10.1016\/j.bbrc.2004.07.179","volume":"322","author":"M Kanamori","year":"2004","unstructured":"Kanamori M, Konno H, Osato N, et al.: A genome-wide and nonredundant mouse transcription factor database. Biochemical and biophysical research communications 2004, 322: 787\u2013793. 10.1016\/j.bbrc.2004.07.179","journal-title":"Biochemical and biophysical research communications"},{"key":"4859_CR37","unstructured":"VENNY. An interactive tool for comparing lists with Venn Diagrams[http:\/\/bioinfogp.cnb.csic.es\/tools\/venny\/index.html]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-S10-S19.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T16:39:10Z","timestamp":1630514350000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-S10-S19"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10,18]]},"references-count":37,"journal-issue":{"issue":"S10","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4859"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-s10-s19","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,10,18]]},"assertion":[{"value":"18 October 2011","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S19"}}