{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T15:52:43Z","timestamp":1770479563485,"version":"3.49.0"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Development of high-throughput technology makes it possible to measure expressions of thousands of genes simultaneously. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated biological functions. It is of great interest to identify differential gene pathways that are associated with the variations of phenotypes.<\/jats:p>\n               <jats:p>Results: We propose the following approach for detecting differential gene pathways. First, we construct gene pathways using databases such as KEGG or GO. Second, for each pathway, we extract a small number of representative features, which are linear combinations of gene expressions and\/or their transformations. Specifically, we propose using (i) principal components (PCs) of gene expression sets, (ii) PCs of expanded gene expression sets and (iii) expanded sets of PCs of gene expressions, as the representative features. Third, we identify differential gene pathways as those with representative features significantly associated with the variations of phenotypes, particularly disease clinical outcomes, in regression models. The false discovery rate approach is used to adjust for multiple comparisons. Analysis of three gene expression datasets suggests that (i) the proposed approach can effectively identify differential gene pathways; (ii) PCs that explain only a small amount of variations of gene expressions may bear significant associations between gene pathways and phenotypes; (iii) including second-order terms of gene expressions may lead to identification of new differential gene pathways; (iv) the proposed approach is relatively insensitive to additional noises; and (v) the proposed approach can identify gene pathways missed by alternative approaches.<\/jats:p>\n               <jats:p>Contact: \u00a0shuangge.ma@yale.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp085","type":"journal-article","created":{"date-parts":[[2009,2,18]],"date-time":"2009-02-18T03:25:13Z","timestamp":1234927513000},"page":"882-889","source":"Crossref","is-referenced-by-count":70,"title":["Identification of differential gene pathways with principal component analysis"],"prefix":"10.1093","volume":"25","author":[{"given":"Shuangge","family":"Ma","sequence":"first","affiliation":[{"name":"1 Department of Epidemiology and Public Health, Yale University, New Haven, CT 06510 and 2Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA"}]},{"given":"Michael R.","family":"Kosorok","sequence":"additional","affiliation":[{"name":"1 Department of Epidemiology and Public Health, Yale University, New Haven, CT 06510 and 2Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,2,17]]},"reference":[{"key":"2023013110150017000_B1","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/1471-2105-10-47","article-title":"A general modular framework for gene set enrichment analysis","volume":"10","author":"Ackermann","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023013110150017000_B2","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nrg1749","article-title":"Microarray data analysis: from disarray to consolidation and consensus","volume":"7","author":"Allison","year":"2006","journal-title":"Nat. Rev. Genet."},{"key":"2023013110150017000_B3","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Ann. Stat."},{"key":"2023013110150017000_B4","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/ng998","article-title":"Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease","volume":"32","author":"Carrasquillo","year":"2002","journal-title":"Nat. Genet."},{"key":"2023013110150017000_B5","doi-asserted-by":"crossref","first-page":"2474","DOI":"10.1093\/bioinformatics\/btn458","article-title":"Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes","volume":"24","author":"Chen","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B6","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1016\/j.tibtech.2005.05.011","article-title":"Pathways to the analysis of microarray data","volume":"23","author":"Curtis","year":"2005","journal-title":"Trends Biotechnol."},{"key":"2023013110150017000_B7","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1214\/07-AOAS101","article-title":"On testing the significance of sets of genes","volume":"1","author":"Efron","year":"2007","journal-title":"Ann. Appl. Stat."},{"key":"2023013110150017000_B8","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1093\/bioinformatics\/btm051","article-title":"Analyzing gene expression data in terms of gene sets: Methodological issues","volume":"23","author":"Goeman","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B9","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1093\/bioinformatics\/btg382","article-title":"A global test for groups of genes: testing association with a clinical outcome","volume":"20","author":"Goeman","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B10","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023013110150017000_B11","first-page":"14","article-title":"Tyrosine metabolism in leukemia","volume":"16","author":"Ivanova","year":"1971","journal-title":"Probl. Gematol. I Pereliv. Krovi."},{"key":"2023013110150017000_B12","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1093\/bioinformatics\/btl599","article-title":"Extensions to gene set enrichment","volume":"23","author":"Jiang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B13","volume-title":"Applied Multivariate Statistical Analysis.","author":"Johnson","year":"2001"},{"key":"2023013110150017000_B14","doi-asserted-by":"crossref","DOI":"10.1002\/0470041102","volume-title":"Cancer Diagnostics with DNA Microarrays.","author":"Knudsen","year":"2006"},{"key":"2023013110150017000_B15","doi-asserted-by":"crossref","first-page":"2373","DOI":"10.1093\/bioinformatics\/btl401","article-title":"A multivariate approach for integrating genome-wide expression data and biological knowledge","volume":"22","author":"Kong","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B16","volume-title":"Introduction to Bioinformatics.","author":"Lesk","year":"2002"},{"key":"2023013110150017000_B17","article-title":"Analyzing Microarray Gene Expression Data","author":"McLachlan","year":"2004","journal-title":"Wiley-Interscience"},{"key":"2023013110150017000_B18","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1007\/BF00199116","article-title":"In vitro modulation of natural killer cell activity in non-Hodgkin's lymphoma patients after therapy","volume":"28","author":"Mehta","year":"1989","journal-title":"Cancer Immunol. Immunother."},{"key":"2023013110150017000_B19","first-page":"118","article-title":"Molecular control of the cell cycle in cancer: biological and clinical aspects","volume":"50","author":"Moller","year":"2003","journal-title":"Dan. Med.l Bull."},{"key":"2023013110150017000_B20","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1093\/bib\/bbn001","article-title":"Gene-set approach for expression pattern analysis","volume":"9","author":"Nam","year":"2008","journal-title":"Brief. Bioinform."},{"key":"2023013110150017000_B21","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1111\/1523-1747.ep12543616","article-title":"Natural cell-mediated cytotoxicity in cutaneous T-cell lymphomas","volume":"81","author":"Neilan","year":"1983","journal-title":"J. Invest. Dermatol."},{"key":"2023013110150017000_B22","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1093\/bioinformatics\/btm583","article-title":"Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis","volume":"24","author":"Nettleton","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B23","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1186\/1471-2105-9-87","article-title":"Building pathway clusters from random forests classification using class votes","volume":"9","author":"Pang","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013110150017000_B24","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1056\/NEJMoa012914","article-title":"The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma","volume":"346","author":"Rosenwald","year":"2002","journal-title":"NEJM"},{"key":"2023013110150017000_B25","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/S1535-6108(03)00028-X","article-title":"The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma","volume":"3","author":"Rosenwald","year":"2003","journal-title":"Cancer Cell"},{"key":"2023013110150017000_B26","doi-asserted-by":"crossref","first-page":"2548","DOI":"10.1093\/bioinformatics\/bti343","article-title":"A web-based tool for principal component and significance analysis of microarray data","volume":"21","author":"Sharov","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B27","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1007\/s10142-008-0084-9","article-title":"Identifying subset of genes that have influential impacts on cancer progression: a new approach to analyze cancer microarray data","volume":"8","author":"Shi","year":"2008","journal-title":"Funct. Integr. Genomics"},{"key":"2023013110150017000_B28","doi-asserted-by":"crossref","first-page":"4419","DOI":"10.1158\/0008-5472.CAN-03-3885","article-title":"High-throughput retroviral tagging for identification of genes involved in initiation and progression of mouse splenic marginal zone lymphomas","volume":"64","author":"Shin","year":"2004","journal-title":"Cancer Res."},{"key":"2023013110150017000_B29","author":"Sneddon","year":"2004","journal-title":"Pathway analysis. SoCalBSI 2004."},{"key":"2023013110150017000_B30","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110150017000_B31","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1186\/1471-2105-9-469","article-title":"Gene set analyses for interpreting microarray experiments on prokaryotic organisms","volume":"9","author":"Tintle","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013110150017000_B32","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1093\/biostatistics\/kxl007","article-title":"Nonparametric pathway-based regression models for analysis of genomic data","volume":"8","author":"Wei","year":"2007","journal-title":"Biostatistics"},{"key":"2023013110150017000_B33","article-title":"The Practical Bioinformatician","author":"Wong","year":"2004","journal-title":"World Scientific Publishing Company."},{"key":"2023013110150017000_B34","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1093\/bioinformatics\/btl034","article-title":"Non-linear tests for identifying differentially expressed genes or genetic networks","volume":"22","author":"Xiong","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B35","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1093\/bioinformatics\/17.9.763","article-title":"Principal component analysis for clustering gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013110150017000_B36","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1093\/bioinformatics\/bti736","article-title":"Gene selection using support vector machines with non-convex penalty","volume":"22","author":"Zhang","year":"2006","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/7\/882\/48983454\/bioinformatics_25_7_882.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/7\/882\/48983454\/bioinformatics_25_7_882.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T20:15:47Z","timestamp":1675196147000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/7\/882\/211069"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,2,17]]},"references-count":36,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2009,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp085","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,4,1]]},"published":{"date-parts":[[2009,2,17]]}}}