{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T23:29:34Z","timestamp":1773271774670,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2216,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Term-enrichment analysis facilitates biological interpretation by assigning to experimentally\/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using the most significant terms to describe a given set of biological entities, often associated with weights. Many existing enrichment methods require selections of (arbitrary number of) the most significant entities and\/or do not account for weights of entities. Others either mandate extensive simulations to obtain statistics or assume normal weight distribution. In addition, most methods have difficulty assigning correct statistical significance to terms with few entities.<\/jats:p><jats:p>Results: Implementing the well-known Lugananni\u2013Rice formula, we have developed a novel approach, called SaddleSum, that is free from all the aforementioned constraints and evaluated it against several existing methods. With entity weights properly taken into account, SaddleSum is internally consistent and stable with respect to the choice of number of most significant entities selected. Making few assumptions on the input data, the proposed method is universal and can thus be applied to areas beyond analysis of microarrays. Employing asymptotic approximation, SaddleSum provides a term-size-dependent score distribution function that gives rise to accurate statistical significance even for terms with few entities. As a consequence, SaddleSum enables researchers to place confidence in its significance assignments to small terms that are often biologically most specific.<\/jats:p><jats:p>Availability: Our implementation, which uses Bonferroni correction to account for multiple hypotheses testing, is available at http:\/\/www.ncbi.nlm.nih.gov\/CBBresearch\/qmbp\/mn\/enrich\/. Source code for the standalone version can be downloaded from ftp:\/\/ftp.ncbi.nlm.nih.gov\/pub\/qmbpmn\/SaddleSum\/.<\/jats:p><jats:p>Contact: \u00a0yyu@ncbi.nlm.nih.gov<\/jats:p><jats:p>Supplementary information: Supplementary materials are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq511","type":"journal-article","created":{"date-parts":[[2010,9,10]],"date-time":"2010-09-10T00:18:07Z","timestamp":1284077887000},"page":"2752-2759","source":"Crossref","is-referenced-by-count":17,"title":["Robust and accurate data enrichment statistics via distribution function of sum of weights"],"prefix":"10.1093","volume":"26","author":[{"given":"Aleksandar","family":"Stojmirovi\u0107","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA"}]},{"given":"Yi-Kuo","family":"Yu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,9,8]]},"reference":[{"key":"2023012507542521000_B1","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1186\/1471-2105-8-114","article-title":"From genes to functional classes in the study of biological systems","volume":"8","author":"Al-Shahrour","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012507542521000_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012507542521000_B3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012507542521000_B4","doi-asserted-by":"crossref","first-page":"W186","DOI":"10.1093\/nar\/gkm323","article-title":"GeneTrail\u2013advanced gene set enrichment analysis","volume":"35","author":"Backes","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012507542521000_B5","doi-asserted-by":"crossref","first-page":"D885","DOI":"10.1093\/nar\/gkn764","article-title":"NCBI GEO: archive for high-throughput functional genomic data","volume":"37","author":"Barrett","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012507542521000_B6","doi-asserted-by":"crossref","first-page":"1129","DOI":"10.1093\/bioinformatics\/bti149","article-title":"Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression","volume":"21","author":"Ben-Shaul","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012507542521000_B7","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc."},{"key":"2023012507542521000_B8","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1093\/bioinformatics\/btl658","article-title":"FIVA: functional information viewer and analyzer extracting biological knowledge from transcriptome data of prokaryotes","volume":"23","author":"Blom","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012507542521000_B9","doi-asserted-by":"crossref","first-page":"W592","DOI":"10.1093\/nar\/gki484","article-title":"T-profiler: scoring the activity of predefined groups of genes using gene expression data","volume":"33","author":"Boorsma","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012507542521000_B10","doi-asserted-by":"crossref","first-page":"3710","DOI":"10.1093\/bioinformatics\/bth456","article-title":"GO::TermFinder\u2013open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes","volume":"20","author":"Boyle","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012507542521000_B11","doi-asserted-by":"crossref","first-page":"D637","DOI":"10.1093\/nar\/gkm1001","article-title":"The BioGRID Interaction Database: 2008 update","volume":"36","author":"Breitkreutz","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507542521000_B12","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/1471-2105-5-34","article-title":"Iterative group analysis (iGA): a simple tool to enhance sensitivity and facilitate interpretation of microarray experiments","volume":"5","author":"Breitling","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012507542521000_B13","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1186\/1471-2105-5-193","article-title":"Comparing functional annotation analyses with Catmap","volume":"5","author":"Breslin","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012507542521000_B14","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1214\/aoms\/1177728652","article-title":"Saddlepoint approximations in statistics","volume":"25","author":"Daniels","year":"1954","journal-title":"Ann. Math. Stat."},{"key":"2023012507542521000_B15","doi-asserted-by":"crossref","first-page":"37","DOI":"10.2307\/1403269","article-title":"Tail probability approximations","volume":"55","author":"Daniels","year":"1987","journal-title":"Internat. Stat. Rev."},{"key":"2023012507542521000_B16","doi-asserted-by":"crossref","first-page":"e39","DOI":"10.1371\/journal.pcbi.0030039","article-title":"Discovering motifs in ranked lists of DNA sequences","volume":"3","author":"Eden","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023012507542521000_B17","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1186\/1471-2105-10-48","article-title":"GOrilla: a tool for discovery and visualization of enriched go terms in ranked gene lists","volume":"10","author":"Eden","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012507542521000_B18","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1093\/bioinformatics\/btm051","article-title":"Analyzing gene expression data in terms of gene sets: methodological issues","volume":"23","author":"Goeman","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012507542521000_B19","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1093\/bib\/bbl019","article-title":"Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL","volume":"8","author":"Gold","year":"2007","journal-title":"Brief. Bioinform."},{"key":"2023012507542521000_B20","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316672","volume-title":"Multiple Comparison Procedures (Wiley Series in Probability and Statistics).","author":"Hochberg","year":"1987"},{"key":"2023012507542521000_B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gkn923","article-title":"Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists","volume":"37","author":"Huang","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012507542521000_B22","doi-asserted-by":"crossref","first-page":"2264","DOI":"10.1073\/pnas.87.6.2264","article-title":"Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes","volume":"87","author":"Karlin","year":"1990","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507542521000_B23","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/1471-2105-6-144","article-title":"PAGE: parametric analysis of gene set enrichment","volume":"6","author":"Kim","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012507542521000_B24","doi-asserted-by":"crossref","first-page":"475","DOI":"10.2307\/1426607","article-title":"Saddle point approximation for the distribution of the sum of independent random variables","volume":"12","author":"Lugannani","year":"1980","journal-title":"Adv. Appl. Probab."},{"key":"2023012507542521000_B25","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1186\/1471-2105-10-161","article-title":"GAGE: generally applicable gene set enrichment for pathway analysis","volume":"10","author":"Luo","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012507542521000_B26","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1038\/ng1180","article-title":"PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes","volume":"34","author":"Mootha","year":"2003","journal-title":"Nat. Genet."},{"key":"2023012507542521000_B27","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1038\/nmeth.1373","article-title":"Proteomics strategy for quantitative protein interaction profiling in cell extracts","volume":"6","author":"Sharma","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012507542521000_B28","doi-asserted-by":"crossref","first-page":"2618","DOI":"10.1093\/bioinformatics\/bth293","article-title":"GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate gene ontology terms","volume":"20","author":"Smid","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012507542521000_B29","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1089\/cmb.2007.0069","article-title":"Information flow in interaction networks","volume":"14","author":"Stojmirovi\u0107","year":"2007","journal-title":"J. Comput. Biol."},{"key":"2023012507542521000_B30","doi-asserted-by":"crossref","first-page":"2447","DOI":"10.1093\/bioinformatics\/btp398","article-title":"ITM Probe: analyzing information flow in protein networks","volume":"25","author":"Stojmirovi\u0107","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012507542521000_B31","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507542521000_B32","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1126\/science.1160342","article-title":"A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome","volume":"321","author":"Sultan","year":"2008","journal-title":"Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2752\/48852240\/bioinformatics_26_21_2752.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2752\/48852240\/bioinformatics_26_21_2752.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,25]],"date-time":"2025-02-25T17:42:05Z","timestamp":1740505325000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/21\/2752\/213760"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9,8]]},"references-count":32,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2010,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq511","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,11,1]]},"published":{"date-parts":[[2010,9,8]]}}}