{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,2]],"date-time":"2022-04-02T19:09:05Z","timestamp":1648926545447},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/workerbee.igb.uiuc.edu:8080\/BeeSpace\/Search.jsp\" ext-link-type=\"uri\">http:\/\/workerbee.igb.uiuc.edu:8080\/BeeSpace\/Search.jsp<\/jats:ext-link>\n            <\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-11-272","type":"journal-article","created":{"date-parts":[[2010,5,20]],"date-time":"2010-05-20T18:15:38Z","timestamp":1274379338000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model"],"prefix":"10.1186","volume":"11","author":[{"given":"Xin","family":"He","sequence":"first","affiliation":[]},{"given":"Moushumi Sen","family":"Sarma","sequence":"additional","affiliation":[]},{"given":"Xu","family":"Ling","sequence":"additional","affiliation":[]},{"given":"Brant","family":"Chee","sequence":"additional","affiliation":[]},{"given":"Chengxiang","family":"Zhai","sequence":"additional","affiliation":[]},{"given":"Bruce","family":"Schatz","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,5,20]]},"reference":[{"key":"3729_CR1","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1038\/35011540","volume":"402","author":"LH Hartwell","year":"1999","unstructured":"Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular cell biology. Nature 1999, 402: 47\u201352. 10.1038\/35011540","journal-title":"Nature"},{"key":"3729_CR2","doi-asserted-by":"publisher","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","volume":"96","author":"M Pellegrini","year":"1999","unstructured":"Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96: 4285\u20134288. 10.1073\/pnas.96.8.4285","journal-title":"Proc Natl Acad Sci USA"},{"key":"3729_CR3","doi-asserted-by":"publisher","first-page":"D262","DOI":"10.1093\/nar\/gkh021","volume":"32","author":"E Camon","year":"2004","unstructured":"Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 2004, 32: D262\u2013266. 10.1093\/nar\/gkh021","journal-title":"Nucleic Acids Res"},{"key":"3729_CR4","doi-asserted-by":"publisher","first-page":"R28","DOI":"10.1186\/gb-2003-4-4-r28","volume":"4","author":"BR Zeeberg","year":"2003","unstructured":"Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN: GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 2003, 4: R28. 10.1186\/gb-2003-4-4-r28","journal-title":"Genome Biol"},{"key":"3729_CR5","doi-asserted-by":"publisher","first-page":"R101","DOI":"10.1186\/gb-2004-5-12-r101","volume":"5","author":"D Martin","year":"2004","unstructured":"Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B: GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol 2004, 5: R101. 10.1186\/gb-2004-5-12-r101","journal-title":"Genome Biol"},{"key":"3729_CR6","doi-asserted-by":"publisher","first-page":"R70","DOI":"10.1186\/gb-2003-4-10-r70","volume":"4","author":"DA Hosack","year":"2003","unstructured":"Hosack DA, Dennis G, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE. Genome Biol 2003, 4: R70. 10.1186\/gb-2003-4-10-r70","journal-title":"Genome Biol"},{"key":"3729_CR7","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1016\/j.cell.2008.06.029","volume":"134","author":"A Rzhetsky","year":"2008","unstructured":"Rzhetsky A, Seringhaus M, Gerstein M: Seeking a new biology through text mining. Cell 2008, 134: 9\u201313. 10.1016\/j.cell.2008.06.029","journal-title":"Cell"},{"issue":"Suppl 2","key":"3729_CR8","doi-asserted-by":"publisher","first-page":"S7","DOI":"10.1186\/gb-2008-9-s2-s7","volume":"9","author":"RB Altman","year":"2008","unstructured":"Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, Gannon F, Grivell L, Hahn U, Hersh W, Hirschman L, Jensen LJ, Krallinger M, Mons B, O'Donoghue SI, Peitsch MC, Rebholz-Schuhmann D, Shatkay H, Valencia A: Text mining for biology-the way forward: opinions from leading scientists. Genome Biol 2008, 9(Suppl 2):S7. 10.1186\/gb-2008-9-s2-s7","journal-title":"Genome Biol"},{"key":"3729_CR9","doi-asserted-by":"publisher","first-page":"RESEARCH0055","DOI":"10.1186\/gb-2002-3-10-research0055","volume":"3","author":"D Chaussabel","year":"2002","unstructured":"Chaussabel D, Sher A: Mining microarray expression data by literature profiling. Genome Biol 2002, 3: RESEARCH0055. 10.1186\/gb-2002-3-10-research0055","journal-title":"Genome Biol"},{"key":"3729_CR10","doi-asserted-by":"publisher","first-page":"1582","DOI":"10.1101\/gr.116402","volume":"12","author":"S Raychaudhuri","year":"2002","unstructured":"Raychaudhuri S, H S, Altman RB: Using text analysis to identify functionally coherent gene groups. Genome Res 2002, 12: 1582\u20131590. 10.1101\/gr.116402","journal-title":"Genome Res"},{"key":"3729_CR11","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1093\/bioinformatics\/bth464","volume":"21","author":"R Homayouni","year":"2005","unstructured":"Homayouni R, Heinrich K, Wei L, Berry MW: Gene clustering by latent semantic indexing of MEDLINE abstracts. Bioinformatics 2005, 21: 104\u2013115. 10.1093\/bioinformatics\/bth464","journal-title":"Bioinformatics"},{"issue":"Suppl 2","key":"3729_CR12","doi-asserted-by":"publisher","first-page":"i259","DOI":"10.1093\/bioinformatics\/bti1143","volume":"21","author":"R Kuffner","year":"2005","unstructured":"Kuffner R, Fundel K, Zimmer R: Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contexts. Bioinformatics 2005, 21(Suppl 2):i259\u2013267.","journal-title":"Bioinformatics"},{"key":"3729_CR13","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1186\/1471-2105-7-41","volume":"7","author":"M Chagoyen","year":"2006","unstructured":"Chagoyen M, Carmona-Saez P, Shatkay H, Carazo JM, Pascual-Montano A: Discovering semantic features in the literature: a foundation for building functional associations. BMC Bioinformatics 2006, 7: 41. 10.1186\/1471-2105-7-41","journal-title":"BMC Bioinformatics"},{"key":"3729_CR14","doi-asserted-by":"publisher","first-page":"W153","DOI":"10.1093\/nar\/gkp392","volume":"37","author":"M Vazquez","year":"2009","unstructured":"Vazquez M, Carmona-Saez P, Nogales-Cadenas R, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A: SENT: semantic features in text. Nucleic Acids Res 2009, 37: W153\u2013159. 10.1093\/nar\/gkp392","journal-title":"Nucleic Acids Res"},{"key":"3729_CR15","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1186\/1471-2105-8-14","volume":"8","author":"R Jelier","year":"2007","unstructured":"Jelier R, Jenster G, Dorssers LC, Wouters BJ, Hendriksen PJ, Mons B, Delwel R, Kors JA: Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation. BMC Bioinformatics 2007, 8: 14. 10.1186\/1471-2105-8-14","journal-title":"BMC Bioinformatics"},{"key":"3729_CR16","doi-asserted-by":"publisher","first-page":"R96","DOI":"10.1186\/gb-2008-9-6-r96","volume":"9","author":"R Jelier","year":"2008","unstructured":"Jelier R, Schuemie MJ, Veldhoven A, Dorssers LC, Jenster G, Kors JA: Anni 2.0: a multipurpose text-mining tool for the life sciences. Genome Biol 2008, 9: R96. 10.1186\/gb-2008-9-6-r96","journal-title":"Genome Biol"},{"key":"3729_CR17","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1007\/s101420000036","volume":"1","author":"C Blaschke","year":"2001","unstructured":"Blaschke C, Oliveros JC, Valencia A: Mining functional information associated with expression arrays. Funct Integr Genomics 2001, 1: 256\u2013268. 10.1007\/s101420000036","journal-title":"Funct Integr Genomics"},{"key":"3729_CR18","doi-asserted-by":"publisher","first-page":"R43","DOI":"10.1186\/gb-2004-5-6-r43","volume":"5","author":"P Glenisson","year":"2004","unstructured":"Glenisson P, Coessens B, Van Vooren S, Mathys J, Moreau Y, De Moor B: TXTGate: profiling gene groups with text-based information. Genome Biol 2004, 5: R43. 10.1186\/gb-2004-5-6-r43","journal-title":"Genome Biol"},{"key":"3729_CR19","doi-asserted-by":"publisher","first-page":"3324","DOI":"10.1093\/bioinformatics\/bti503","volume":"21","author":"A Djebbari","year":"2005","unstructured":"Djebbari A, Karamycheva S, Howe E, Quackenbush J: MeSHer: identifying biological concepts in microarray assays based on PubMed references and MeSH terms. Bioinformatics 2005, 21: 3324\u20133326. 10.1093\/bioinformatics\/bti503","journal-title":"Bioinformatics"},{"key":"3729_CR20","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/1471-2105-6-12","volume":"6","author":"R Rubinstein","year":"2005","unstructured":"Rubinstein R, Simon I: MILANO-custom annotation of microarray results using automatic literature searches. BMC Bioinformatics 2005, 6: 12. 10.1186\/1471-2105-6-12","journal-title":"BMC Bioinformatics"},{"key":"3729_CR21","doi-asserted-by":"publisher","first-page":"e79","DOI":"10.1093\/nar\/gkp310","volume":"37","author":"HS Leong","year":"2009","unstructured":"Leong HS, Kipling D: Text-based over-representation analysis of microarray gene lists with annotation bias. Nucleic Acids Res 2009, 37: e79. 10.1093\/nar\/gkp310","journal-title":"Nucleic Acids Res"},{"key":"3729_CR22","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1093\/nar\/gkl993","volume":"35","author":"D Maglott","year":"2007","unstructured":"Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2007, 35: 26\u201331. 10.1093\/nar\/gkl993","journal-title":"Nucleic Acids Res"},{"key":"3729_CR23","first-page":"40","volume-title":"Pac Symp Biocomput","author":"X Ling","year":"2006","unstructured":"Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B: Automatically generating gene summaries from biomedical literature. Pac Symp Biocomput 2006, 40\u201351. full_text"},{"key":"3729_CR24","doi-asserted-by":"publisher","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","volume":"95","author":"MB Eisen","year":"1998","unstructured":"Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95: 14863\u201314868. 10.1073\/pnas.95.25.14863","journal-title":"Proc Natl Acad Sci USA"},{"key":"3729_CR25","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1042\/BJ20031885","volume":"382","author":"A Bruckmann","year":"2004","unstructured":"Bruckmann A, Steensma HY, Teixeira De Mattos MJ, Van Heusden GP: Regulation of transcription by Saccharomyces cerevisiae 14\u20133-3 proteins. Biochem J 2004, 382: 867\u2013875. 10.1042\/BJ20031885","journal-title":"Biochem J"},{"key":"3729_CR26","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.gene.2005.03.040","volume":"354","author":"SM Jazwinski","year":"2005","unstructured":"Jazwinski SM: The retrograde response links metabolism with stress responses, chromatin-dependent gene activation, and genome stability in yeast aging. Gene 2005, 354: 22\u201327. 10.1016\/j.gene.2005.03.040","journal-title":"Gene"},{"key":"3729_CR27","doi-asserted-by":"publisher","first-page":"4241","DOI":"10.1091\/mbc.11.12.4241","volume":"11","author":"AP Gasch","year":"2000","unstructured":"Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 2000, 11: 4241\u20134257.","journal-title":"Mol Biol Cell"},{"key":"3729_CR28","doi-asserted-by":"publisher","first-page":"16068","DOI":"10.1073\/pnas.0606909103","volume":"103","author":"CW Whitfield","year":"2006","unstructured":"Whitfield CW, Ben-Shahar Y, Brillet C, Leoncini I, Crauser D, Leconte Y, Rodriguez-Zas S, Robinson GE: Genomic dissection of behavioral maturation in the honey bee. Proc Natl Acad Sci USA 2006, 103: 16068\u201316075. 10.1073\/pnas.0606909103","journal-title":"Proc Natl Acad Sci USA"},{"key":"3729_CR29","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1186\/1471-2164-8-202","volume":"8","author":"M Sen Sarma","year":"2007","unstructured":"Sen Sarma M, Whitfield CW, Robinson GE: Species differences in brain gene expression profiles associated with adult behavioral maturation in honey bees. BMC Genomics 2007, 8: 202. 10.1186\/1471-2164-8-202","journal-title":"BMC Genomics"},{"key":"3729_CR30","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1016\/j.jinsphys.2004.11.009","volume":"51","author":"SA Hayward","year":"2005","unstructured":"Hayward SA, Pavlides SC, Tammariello SP, Rinehart JP, Denlinger DL: Temporal expression patterns of diapause-associated genes in flesh fly pupae from the onset of diapause through post-diapause quiescence. J Insect Physiol 2005, 51: 631\u2013640. 10.1016\/j.jinsphys.2004.11.009","journal-title":"J Insect Physiol"},{"key":"3729_CR31","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1016\/j.jinsphys.2004.11.012","volume":"51","author":"S Tachibana","year":"2005","unstructured":"Tachibana S, Numata H, Goto SG: Gene expression of heat-shock proteins (Hsp23, Hsp70 and Hsp90) during and after larval diapause in the blow fly Lucilia sericata. J Insect Physiol 2005, 51: 641\u2013647. 10.1016\/j.jinsphys.2004.11.012","journal-title":"J Insect Physiol"},{"key":"3729_CR32","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1016\/j.conb.2004.08.011","volume":"14","author":"N Hirokawa","year":"2004","unstructured":"Hirokawa N, Takemura R: Molecular motors in neuronal development, intracellular transport and diseases. Curr Opin Neurobiol 2004, 14: 564\u2013573. 10.1016\/j.conb.2004.08.011","journal-title":"Curr Opin Neurobiol"},{"key":"3729_CR33","doi-asserted-by":"publisher","first-page":"467","DOI":"10.1016\/S0092-8674(03)00111-9","volume":"112","author":"RD Vale","year":"2003","unstructured":"Vale RD: The molecular motor toolbox for intracellular transport. Cell 2003, 112: 467\u2013480. 10.1016\/S0092-8674(03)00111-9","journal-title":"Cell"},{"key":"3729_CR34","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1073\/pnas.0508318102","volume":"103","author":"N Ismail","year":"2006","unstructured":"Ismail N, Robinson GE, Fahrbach SE: Stimulation of muscarinic receptors mimics experience-dependent plasticity in the honey bee brain. Proc Natl Acad Sci USA 2006, 103: 207\u2013211. 10.1073\/pnas.0508318102","journal-title":"Proc Natl Acad Sci USA"},{"key":"3729_CR35","first-page":"415","volume-title":"Proc IEEE Comput Syst Bioinform Conf","author":"RM Podowski","year":"2004","unstructured":"Podowski RM, Cleary JG, Goncharoff NT, Amoutzias G, Hayes WS: AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proc IEEE Comput Syst Bioinform Conf 2004, 415\u2013424."},{"key":"3729_CR36","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/j.csda.2004.07.013","volume":"50","author":"J Li","year":"2006","unstructured":"Li J, Zha H: Two-way Poisson mixture models for simultaneous document classification and word clustering. Computational Statistics and Data Analysis 2006, 50: 163\u2013180. 10.1016\/j.csda.2004.07.013","journal-title":"Computational Statistics and Data Analysis"},{"key":"3729_CR37","volume-title":"Statistical inference","author":"G Casella","year":"2001","unstructured":"Casella G, Berger R: Statistical inference. Duxbury Press; 2001."},{"key":"3729_CR38","doi-asserted-by":"publisher","first-page":"193","DOI":"10.2307\/2530819","volume":"39","author":"MJ Symons","year":"1983","unstructured":"Symons MJ, Grimson RC, Yuan YC: Clustering of rare events. Biometrics 1983, 39: 193\u2013205. 10.2307\/2530819","journal-title":"Biometrics"},{"key":"3729_CR39","volume-title":"In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics","author":"S Banerjee","year":"2003","unstructured":"Banerjee S, Pedersen T: The Design, Implementation, and Use of the Ngram Statistic Package. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics 2003."},{"issue":"Suppl 1","key":"3729_CR40","doi-asserted-by":"publisher","first-page":"S14","DOI":"10.1186\/1471-2105-6-S1-S14","volume":"6","author":"D Hanisch","year":"2005","unstructured":"Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J: ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics 2005, 6(Suppl 1):S14. 10.1186\/1471-2105-6-S1-S14","journal-title":"BMC Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-272.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T12:13:57Z","timestamp":1630498437000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-272"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,5,20]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3729"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-272","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,5,20]]},"assertion":[{"value":"8 December 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 May 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 May 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"272"}}