{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T19:42:37Z","timestamp":1766086957040},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Gene set analysis is the analysis of a set of genes that collectively contribute to a biological process. Most popular gene set analysis methods are based on empirical P -value that requires large number of permutations. Despite numerous gene set analysis methods developed in the past decade, the most popular methods still suffer from serious limitations.<\/jats:p><jats:p>Results: We present a gene set analysis method (mGSZ) based on Gene Set Z-scoring function (GSZ) and asymptotic P -values. Asymptotic P -value calculation requires fewer permutations, and thus speeds up the gene set analysis process. We compare the GSZ-scoring function with seven popular gene set scoring functions and show that GSZ stands out as the best scoring function. In addition, we show improved performance of the GSA method when the max-mean statistics is replaced by the GSZ scoring function. We demonstrate the importance of both gene and sample permutations by showing the consequences in the absence of one or the other. A comparison of asymptotic and empirical methods of P -value estimation demonstrates a clear advantage of asymptotic P -value over empirical P -value. We show that mGSZ outperforms the state-of-the-art methods based on two different evaluations. We compared mGSZ results with permutation and rotation tests and show that rotation does not improve our asymptotic P -values. We also propose well-known asymptotic distribution models for three of the compared methods.<\/jats:p><jats:p>Availability and implementation : mGSZ is available as R package from cran.r-project.org.<\/jats:p><jats:p>Contact: \u00a0pashupati.mishra@helsinki.fi<\/jats:p><jats:p>Supplementary information: Available at http:\/\/ekhidna.biocenter.helsinki.fi\/downloads\/pashupati\/mGSZ.html<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu374","type":"journal-article","created":{"date-parts":[[2014,6,6]],"date-time":"2014-06-06T01:53:57Z","timestamp":1402019637000},"page":"2747-2756","source":"Crossref","is-referenced-by-count":17,"title":["Gene set analysis: limitations in popular existing methods and proposed improvements"],"prefix":"10.1093","volume":"30","author":[{"given":"Pashupati","family":"Mishra","sequence":"first","affiliation":[{"name":"1 Institute of Biotechnology, University of Helsinki, Helsinki, Finland and 2 CSC - IT Center for Science, Ltd., Espoo, Finland"}]},{"given":"Petri","family":"T\u00f6r\u00f6nen","sequence":"additional","affiliation":[{"name":"1 Institute of Biotechnology, University of Helsinki, Helsinki, Finland and 2 CSC - IT Center for Science, Ltd., Espoo, Finland"}]},{"given":"Yrj\u00f6","family":"Leino","sequence":"additional","affiliation":[{"name":"1 Institute of Biotechnology, University of Helsinki, Helsinki, Finland and 2 CSC - IT Center for Science, Ltd., Espoo, Finland"}]},{"given":"Liisa","family":"Holm","sequence":"additional","affiliation":[{"name":"1 Institute of Biotechnology, University of Helsinki, Helsinki, Finland and 2 CSC - IT Center for Science, Ltd., Espoo, Finland"}]}],"member":"286","published-online":{"date-parts":[[2014,6,5]]},"reference":[{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/1471-2105-10-47","article-title":"A general modular framework for gene set enrichment analysis","volume":"10","author":"Ackermann","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/ng765","article-title":"Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia","volume":"30","author":"Armstrong","year":"2002","journal-title":"Nat. Genet."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The gene ontology consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1186\/1471-2105-8-242","article-title":"Improving gene set analysis of microarray data by sam-gs","volume":"8","author":"Dinu","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1418","article-title":"Rotation testing in gene set enrichment analysis for small direct comparison experiments","volume":"8","author":"D\u00f8rum","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023041303363207800_","first-page":"107","article-title":"On testing the significance of sets of genes","volume":"1","author":"Efron","year":"2006","journal-title":"Ann. Appl. Stat."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"D866","DOI":"10.1093\/nar\/gkm815","article-title":"Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata","volume":"36","author":"Faith","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1093\/bioinformatics\/btm051","article-title":"Analyzing gene expression data in terms of gene sets: methodological issues","volume":"23","author":"Goeman","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041303363207800_","volume-title":"ismev: An Introduction to Statistical Modeling of Extreme Values","author":"Heffernan","year":"2012"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1177\/0962280209351908","article-title":"Gene set enrichment analysis made simple","volume":"18","author":"Irizarry","year":"2009","journal-title":"Stat. Methods Med. Res."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"Kegg: Kyoto encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/1471-2105-6-144","article-title":"Page: parametric analysis of gene set enrichment","volume":"6","author":"Kim","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"i161","DOI":"10.1093\/bioinformatics\/btp211","article-title":"Fewer permutations, more accurate p-values","volume":"25","author":"Knijnenburg","year":"2009","journal-title":"Bioinformatics"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1038\/ng1180","article-title":"Pgc-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes","volume":"34","author":"Mootha","year":"2003","journal-title":"Nat. Genet."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/bioinformatics\/bts164","article-title":"Rigorous assessment of gene set enrichment tests","volume":"28","author":"Naeem","year":"2012","journal-title":"Bioinformatics"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1214\/07-AOAS104","article-title":"Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis","volume":"1","author":"Newton","year":"2007","journal-title":"Ann. Appl. Stat."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"39","DOI":"10.2202\/1544-6115.1585","article-title":"Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn","volume":"9","author":"Phipson","year":"2010","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"5539","DOI":"10.1093\/nar\/gkh894","article-title":"The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes","volume":"32","author":"Ruepp","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"13544","DOI":"10.1073\/pnas.0506577102","article-title":"Discovering statistically significant pathways in expression profiling studies","volume":"102","author":"Tian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1186\/1471-2105-10-307","article-title":"Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function","volume":"10","author":"T\u00f6r\u00f6nen","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21706-2","volume-title":"Modern Applied Statistics with S-plus","author":"Venables","year":"2002","edition":"4th edn"},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"e133","DOI":"10.1093\/nar\/gks461","article-title":"Camera: a competitive gene set test accounting for inter-gene correlation","volume":"40","author":"Wu","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023041303363207800_","doi-asserted-by":"crossref","first-page":"2176","DOI":"10.1093\/bioinformatics\/btq401","article-title":"Roast: rotation gene set tests for complex microarray experiments","volume":"26","author":"Wu","year":"2010","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/19\/2747\/49872443\/bioinformatics_30_19_2747.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/19\/2747\/49872443\/bioinformatics_30_19_2747.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,14]],"date-time":"2023-07-14T00:25:03Z","timestamp":1689294303000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/19\/2747\/2422190"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6,5]]},"references-count":24,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2014,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu374","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,10]]},"published":{"date-parts":[[2014,6,5]]}}}