{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T13:33:40Z","timestamp":1771076020115,"version":"3.50.1"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The search for enriched features has become widely used to characterize a set of genes or proteins. A key aspect of this technique is its ability to identify correlations amongst heterogeneous data such as Gene Ontology annotations, gene expression data and genome location of genes. Despite the rapid growth of available data, very little has been proposed in terms of formalization and optimization. Additionally, current methods mainly ignore the structure of the data which causes results redundancy. For example, when searching for enrichment in GO terms, genes can be annotated with multiple GO terms and should be propagated to the more general terms in the Gene Ontology. Consequently, the gene sets often overlap partially or totally, and this causes the reported enriched GO terms to be both numerous and redundant, hence, overwhelming the researcher with non-pertinent information. This situation is not unique, it arises whenever some hierarchical clustering is performed (<jats:italic>e.g<\/jats:italic>. based on the gene expression profiles), the extreme case being when genes that are neighbors on the chromosomes are considered.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We present a generic framework to efficiently identify the most pertinent over-represented features in a set of genes. We propose a formal representation of gene sets based on the theory of partially ordered sets (posets), and give a formal definition of target set pertinence. Algorithms and compact representations of target sets are provided for the generation and the evaluation of the pertinent target sets. The relevance of our method is illustrated through the search for enriched GO annotations in the proteins involved in a multiprotein complex. The results obtained demonstrate the gain in terms of pertinence (up to 64% redundancy removed), space requirements (up to 73% less storage) and efficiency (up to 98% less comparisons).<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The generic framework presented in this article provides a formal approach to adequately represent available data and efficiently search for pertinent over-represented features in a set of genes or proteins. The formalism and the pertinence definition can be directly used by most of the methods and tools currently available for feature enrichment analysis.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-8-332","type":"journal-article","created":{"date-parts":[[2007,9,11]],"date-time":"2007-09-11T18:13:28Z","timestamp":1189534408000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["How to decide which are the most pertinent overly-represented features during gene set enrichment analysis"],"prefix":"10.1186","volume":"8","author":[{"given":"Roland","family":"Barriot","sequence":"first","affiliation":[]},{"given":"David J","family":"Sherman","sequence":"additional","affiliation":[]},{"given":"Isabelle","family":"Dutour","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2007,9,11]]},"reference":[{"issue":"18","key":"1704_CR1","doi-asserted-by":"publisher","first-page":"3587","DOI":"10.1093\/bioinformatics\/bti565","volume":"21","author":"P Khatri","year":"2005","unstructured":"Khatri P, Draghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005, 21 (18): 3587-3595. 10.1093\/bioinformatics\/bti565.","journal-title":"Bioinformatics"},{"key":"1704_CR2","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"The Gene Ontology Consortium","year":"2000","unstructured":"The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29. 10.1038\/75556.","journal-title":"Nat Genet"},{"issue":"suppl 1","key":"1704_CR3","first-page":"D154","volume":"33","author":"A Bairoch","year":"2005","unstructured":"Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LSL: The Universal Protein Resource (UniProt). Nucleic Acids Res. 2005, 33 (suppl 1): D154-159.","journal-title":"Nucleic Acids Res"},{"key":"1704_CR4","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1093\/nar\/28.1.27","volume":"28","author":"M Kanehisa","year":"2000","unstructured":"Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl Acids Res. 2000, 28: 27-30. 10.1093\/nar\/28.1.27.","journal-title":"Nucl Acids Res"},{"key":"1704_CR5","volume-title":"The Delphic boat: what genomes tell us. translated by Alison Quayle","author":"A Danchin","year":"2002","unstructured":"Danchin A: The Delphic boat: what genomes tell us. translated by Alison Quayle. 2002, Cambridge, MA: Harvard University Press"},{"issue":"5","key":"1704_CR6","doi-asserted-by":"publisher","first-page":"383","DOI":"10.1093\/bioinformatics\/14.5.383","volume":"14","author":"A Danchin","year":"1998","unstructured":"Danchin A: The Delphic boat or what the genomic texts tell us. Bioinformatics. 1998, 14 (5): 383-10.1093\/bioinformatics\/14.5.383.","journal-title":"Bioinformatics"},{"issue":"12","key":"1704_CR7","doi-asserted-by":"publisher","first-page":"3581","DOI":"10.1093\/nar\/gkh681","volume":"32","author":"R Barriot","year":"2004","unstructured":"Barriot R, Poix J, Groppi A, Barr\u00e9 A, Goffard N, Sherman D, Dutour I, de Daruvar A: New strategy for the representation and the integration of biomolecular knowledge at a cellular scale. Nucleic Acids Research. 2004, 32 (12): 3581-3589. 10.1093\/nar\/gkh681.","journal-title":"Nucleic Acids Research"},{"key":"1704_CR8","volume-title":"Lattice theory","author":"G Birkhoff","year":"1967","unstructured":"Birkhoff G: Lattice theory. 1967, American Mathematical Society, Providence, 3","edition":"3"},{"key":"1704_CR9","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1186\/1471-2105-3-35","volume":"3","author":"M Robinson","year":"2002","unstructured":"Robinson M, Grigull J, Mohammad N, Hughes T: FunSpec: a web-based cluster interpreter for yeast. BMC Bioinformatics. 2002, 3: 35-10.1186\/1471-2105-3-35.","journal-title":"BMC Bioinformatics"},{"issue":"9","key":"1704_CR10","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.1093\/bioinformatics\/bth088","volume":"20","author":"T Beissbarth","year":"2004","unstructured":"Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20 (9): 1464-1465. 10.1093\/bioinformatics\/bth088.","journal-title":"Bioinformatics"},{"issue":"10","key":"1704_CR11","doi-asserted-by":"publisher","first-page":"R70","DOI":"10.1186\/gb-2003-4-10-r70","volume":"4","author":"D Hosack","year":"2003","unstructured":"Hosack D, Dennis G, Sherman B, Lane H, Lempicki R: Identifying biological themes within lists of genes with EASE. Genome Biology. 2003, 4 (10): R70-10.1186\/gb-2003-4-10-r70.","journal-title":"Genome Biology"},{"issue":"19","key":"1704_CR12","doi-asserted-by":"publisher","first-page":"5617","DOI":"10.1093\/nar\/gkg769","volume":"31","author":"N Kaplan","year":"2003","unstructured":"Kaplan N, Vaaknin A, Linial M: PANDORA: keyword-based analysis of protein sets by integration of annotation sources. Nucl Acids Res. 2003, 31 (19): 5617-5626. 10.1093\/nar\/gkg769.","journal-title":"Nucl Acids Res"},{"issue":"8","key":"1704_CR13","doi-asserted-by":"publisher","first-page":"2533","DOI":"10.1093\/nar\/gkm054","volume":"35","author":"S Van Vooren","year":"2007","unstructured":"Van Vooren S, Thienpont B, Menten B, Speleman F, Moor BD, Vermeesch J, Moreau Y: Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations. Nucl Acids Res. 2007, 35 (8): 2533-2543. 10.1093\/nar\/gkm054.","journal-title":"Nucl Acids Res"},{"issue":"17","key":"1704_CR14","doi-asserted-by":"publisher","first-page":"3575","DOI":"10.1093\/bioinformatics\/bti574","volume":"21","author":"G Wrobel","year":"2005","unstructured":"Wrobel G, Chalmel F, Primig M: goCluster integrates statistical analysis and functional interpretation of microarray expression data. Bioinformatics. 2005, 21 (17): 3575-3577. 10.1093\/bioinformatics\/bti574.","journal-title":"Bioinformatics"},{"key":"1704_CR15","volume-title":"Enzyme Nomenclature: Recommendations (1992) of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology","author":"IUBMB","year":"1992","unstructured":"IUBMB: Enzyme Nomenclature: Recommendations (1992) of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. 1992, Academic Press, San Diego, CA"},{"key":"1704_CR16","volume-title":"PhD thesis","author":"R Barriot","year":"2005","unstructured":"Barriot R: Int\u00e9gration des connaissances biologiques \u00e0 l'\u00e9chelle de la cellule. PhD thesis. 2005, Universit\u00e9 Bordeaux 1, Laboratoire Bordelais de Recherche en Informatique"},{"issue":"suppl 1","key":"1704_CR17","doi-asserted-by":"publisher","first-page":"D169","DOI":"10.1093\/nar\/gkj148","volume":"34","author":"HW Mewes","year":"2006","unstructured":"Mewes HW, Frishman D, Mayer KFX, Munsterkotter M, Noubibou O, Pagel P, Rattei T, Oesterheld M, Ruepp A, Stumpflen V: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucl Acids Res. 2006, 34 (suppl 1): D169-172. 10.1093\/nar\/gkj148.","journal-title":"Nucl Acids Res"},{"issue":"suppl 1","key":"1704_CR18","doi-asserted-by":"publisher","first-page":"D432","DOI":"10.1093\/nar\/gkj160","volume":"34","author":"D Sherman","year":"2006","unstructured":"Sherman D, Durrens P, Iragne F, Beyne E, Nikolski M, Souciet JL: Genolevures complete genomes provide data and tools for comparative genomics of hemiascomycetous yeasts. Nucl Acids Res. 2006, 34 (suppl 1): D432-435. 10.1093\/nar\/gkj160.","journal-title":"Nucl Acids Res"},{"key":"1704_CR19","unstructured":"Saccharomyces Genome Database. [http:\/\/www.yeastgenome.org\/]"},{"issue":"19","key":"1704_CR20","doi-asserted-by":"publisher","first-page":"7238","DOI":"10.1128\/MCB.20.19.7238-7246.2000","volume":"20","author":"A Colley","year":"2000","unstructured":"Colley A, Beggs JD, Tollervey D, Lafontaine DLJ: Dhr1p, a Putative DEAH-Box RNA Helicase, Is Associated with the Box C+D snoRNP U3. Mol Cell Biol. 2000, 20 (19): 7238-7246. 10.1128\/MCB.20.19.7238-7246.2000.","journal-title":"Mol Cell Biol"},{"key":"1704_CR21","volume-title":"Data Mining. Concepts and Techniques","author":"J Han","year":"2006","unstructured":"Han J, Kamber M: Data Mining. Concepts and Techniques. 2006, Morgan Kaufmann, 2","edition":"2"},{"issue":"suppl_1","key":"1704_CR22","doi-asserted-by":"publisher","first-page":"i169","DOI":"10.1093\/bioinformatics\/bth921","volume":"20","author":"CA Joslyn","year":"2004","unstructured":"Joslyn CA, Mniszewski SM, Fulmer A, Heaton G: The Gene Ontology Categorizer. Bioinformatics. 2004, 20 (suppl_1): i169-177. 10.1093\/bioinformatics\/bth921.","journal-title":"Bioinformatics"},{"issue":"suppl_1","key":"1704_CR23","doi-asserted-by":"publisher","first-page":"D322","DOI":"10.1093\/nar\/gkl799","volume":"35","author":"G Alterovitz","year":"2007","unstructured":"Alterovitz G, Xiang M, Mohan M, Ramoni MF: GO PaD: the Gene Ontology Partition Database. Nucl Acids Res. 2007, 35 (suppl_1): D322-327. 10.1093\/nar\/gkl799.","journal-title":"Nucl Acids Res"},{"issue":"18","key":"1704_CR24","doi-asserted-by":"publisher","first-page":"2249","DOI":"10.1093\/bioinformatics\/btl378","volume":"22","author":"D Nam","year":"2006","unstructured":"Nam D, Kim SB, Kim SK, Yang S, Kim SY, Chu IS: ADGO: analysis of differentially expressed gene sets using composite GO annotation. Bioinformatics. 2006, 22 (18): 2249-2253. 10.1093\/bioinformatics\/btl378.","journal-title":"Bioinformatics"},{"key":"1704_CR25","doi-asserted-by":"publisher","first-page":"R3","DOI":"10.1186\/gb-2007-8-1-r3","volume":"8","author":"P Carmona-Saez","year":"2007","unstructured":"Carmona-Saez P, Chagoyen M, Tirado F, Carazo J, Pascual-Montano A: GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biology. 2007, 8: R3-10.1186\/gb-2007-8-1-r3.","journal-title":"Genome Biology"},{"issue":"16","key":"1704_CR26","doi-asserted-by":"publisher","first-page":"2020","DOI":"10.1093\/bioinformatics\/btl334","volume":"22","author":"S Myhre","year":"2006","unstructured":"Myhre S, Tveit H, Mollestad T, Lagreid A: Additional Gene Ontology structure for improved biological reasoning. Bioinformatics. 2006, 22 (16): 2020-2027. 10.1093\/bioinformatics\/btl334.","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-8-332.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T01:55:11Z","timestamp":1630461311000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-8-332"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9,11]]},"references-count":26,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["1704"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-8-332","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,9,11]]},"assertion":[{"value":"5 December 2006","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 September 2007","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 September 2007","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"332"}}