{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,3]],"date-time":"2024-08-03T19:39:32Z","timestamp":1722713972278},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Based on available biological information, genomic data can often be partitioned into pre-defined sets (e.g. pathways) and subsets within sets. Biologists are often interested in determining whether some pre-defined sets of variables (e.g. genes) are differentially expressed under varying experimental conditions. Several procedures are available in the literature for making such determinations, however, they do not take into account information regarding the subsets within each set. Secondly, variables (e.g. genes) belonging to a set or a subset are potentially correlated, yet such information is often ignored and univariate methods are used. This may result in loss of power and\/or inflated false positive rate.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We introduce a multiple testing-based methodology which makes use of available information regarding biologically relevant subsets within each pre-defined set of variables while exploiting the underlying dependence structure among the variables. Using this methodology, a biologist may not only determine whether a set of variables are differentially expressed between two experimental conditions, but may also test whether specific subsets within a significant set are also significant.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>The proposed methodology; (a) is easy to implement, (b) does not require inverting potentially singular covariance matrices, and (c) controls the family wise error rate (FWER) at the desired nominal level, (d) is robust to the underlying distribution and covariance structures. Although for simplicity of exposition, the methodology is described for microarray gene expression data, it is also applicable to any high dimensional data, such as the mRNA seq data, CpG methylation data etc.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-13-177","type":"journal-article","created":{"date-parts":[[2012,7,24]],"date-time":"2012-07-24T18:23:00Z","timestamp":1343154180000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data"],"prefix":"10.1186","volume":"13","author":[{"given":"Wenge","family":"Guo","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingan","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chuanhua","family":"Xing","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shyamal D","family":"Peddada","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2012,7,24]]},"reference":[{"key":"5367_CR1","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1016\/S0888-7543(02)00021-6","volume":"81","author":"S Draghici","year":"2003","unstructured":"Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics. 2003, 81: 98-104. 10.1016\/S0888-7543(02)00021-6.","journal-title":"Genomics"},{"key":"5367_CR2","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1038\/ng1180","volume":"34","author":"VK Mootha","year":"2003","unstructured":"Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1 \u03b1\u2212resonsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003, 34: 267-273. 10.1038\/ng1180.","journal-title":"Nat Genet"},{"key":"5367_CR3","doi-asserted-by":"publisher","first-page":"15545","DOI":"10.1073\/pnas.0506580102","volume":"102","author":"A Subramanian","year":"2005","unstructured":"Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpretting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073\/pnas.0506580102.","journal-title":"Proc Natl Acad Sci USA"},{"key":"5367_CR4","doi-asserted-by":"publisher","first-page":"13544","DOI":"10.1073\/pnas.0506577102","volume":"102","author":"L Tian","year":"2005","unstructured":"Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ: Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad of Sci USA. 2005, 102: 13544-13549. 10.1073\/pnas.0506577102.","journal-title":"Proc Natl Acad of Sci USA"},{"key":"5367_CR5","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1214\/07-AOAS101","volume":"1","author":"B Efron","year":"2007","unstructured":"Efron B, Tibshirani R: On testing the significance of sets of genes. Ann Appl Stat. 2007, 1: 107-129. 10.1214\/07-AOAS101.","journal-title":"Ann Appl Stat"},{"key":"5367_CR6","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1093\/bioinformatics\/btg382","volume":"20","author":"JJ Goeman","year":"2004","unstructured":"Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004, 20: 93-99. 10.1093\/bioinformatics\/btg382.","journal-title":"Bioinformatics"},{"key":"5367_CR7","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1186\/1471-2105-6-225","volume":"6","author":"J Tomfohr","year":"2005","unstructured":"Tomfohr J, Lu J, Kepler TB: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics. 2005, 6: 225-10.1186\/1471-2105-6-225.","journal-title":"BMC Bioinformatics"},{"key":"5367_CR8","doi-asserted-by":"publisher","first-page":"2373","DOI":"10.1093\/bioinformatics\/btl401","volume":"22","author":"SW Kong","year":"2006","unstructured":"Kong SW, Pu WT, Park PJ: A multivariate approach for integrating genomewide expression data and biological knowledge. Bioinformatics. 2006, 22: 2373-2380. 10.1093\/bioinformatics\/btl401.","journal-title":"Bioinformatics"},{"key":"5367_CR9","doi-asserted-by":"publisher","first-page":"242","DOI":"10.1186\/1471-2105-8-242","volume":"8","author":"I Dinu","year":"2007","unstructured":"Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y: Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics. 2007, 8: 242-10.1186\/1471-2105-8-242.","journal-title":"BMC Bioinformatics"},{"key":"5367_CR10","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1093\/bioinformatics\/btm531","volume":"24","author":"M Hummel","year":"2008","unstructured":"Hummel M, Meister R, Mansmann U: GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics. 2008, 24: 78-85. 10.1093\/bioinformatics\/btm531.","journal-title":"Bioinformatics"},{"key":"5367_CR11","doi-asserted-by":"publisher","first-page":"897","DOI":"10.1093\/bioinformatics\/btp098","volume":"25","author":"C Tsai","year":"2009","unstructured":"Tsai C, Chen J: Multivariate analysis of variance test for gene set analysis. Bioinformatics. 2009, 25: 897-903. 10.1093\/bioinformatics\/btp098.","journal-title":"Bioinformatics"},{"key":"5367_CR12","doi-asserted-by":"publisher","first-page":"2104","DOI":"10.1093\/bioinformatics\/btm310","volume":"23","author":"JJ Chen","year":"2007","unstructured":"Chen JJ, Lee T, Delongchamp RR, Chen T, Tsai CA: Significance analysis of groups of genes in expression profiling studies. Bioinformatics. 2007, 23: 2104-2112. 10.1093\/bioinformatics\/btm310.","journal-title":"Bioinformatics"},{"key":"5367_CR13","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1089\/cmb.2008.0002","volume":"15","author":"AJ Adewale","year":"2008","unstructured":"Adewale AJ, Dinu I, Potter JD, Liu Q, Yasui Y: Pathway analysis of microarray data via regression. J Comput Biol. 2008, 15: 269-277. 10.1089\/cmb.2008.0002.","journal-title":"J Comput Biol"},{"key":"5367_CR14","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1186\/1471-2105-9-481","volume":"9","author":"R Lin","year":"2008","unstructured":"Lin R, Dai S, Irwin RD, Heinloth AN, Boorman GA, Li L: Gene set enrichment analysis for non-monotone association and multiple experimental categories. BMC Bioinformatics. 2008, 9: 481-10.1186\/1471-2105-9-481.","journal-title":"BMC Bioinformatics"},{"key":"5367_CR15","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1093\/bioinformatics\/btm051","volume":"23","author":"JJ Goeman","year":"2007","unstructured":"Goeman JJ, Buhlmann P: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007, 23: 980-987. 10.1093\/bioinformatics\/btm051.","journal-title":"Bioinformatics"},{"key":"5367_CR16","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1093\/bib\/bbn001","volume":"9","author":"D Nam","year":"2008","unstructured":"Nam D, Kim S: Gene-set approach for expression pattern analysis. Briefings in Bioinformatics. 2008, 9: 189-197. 10.1093\/bib\/bbn001.","journal-title":"Briefings in Bioinformatics"},{"key":"5367_CR17","volume-title":"Applied Multivariate Statistical Analysis (4th ed)","author":"R Johnson","year":"1998","unstructured":"Johnson R, Wichern D: Applied Multivariate Statistical Analysis (4th ed). 1998, Prentice Hall, Upper Saddle River, New Jersey, USA"},{"key":"5367_CR18","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1093\/bioinformatics\/bti029","volume":"21","author":"BS Kim","year":"2005","unstructured":"Kim BS, Kim I, Lee S, Kim S, Rha SY, Chung HC: Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer. Bioinformatics. 2005, 21: 517-528. 10.1093\/bioinformatics\/bti029.","journal-title":"Bioinformatics"},{"key":"5367_CR19","doi-asserted-by":"publisher","first-page":"3105","DOI":"10.1093\/bioinformatics\/bti496","volume":"21","author":"Y Lu","year":"2005","unstructured":"Lu Y, Liu P-Y, Xiao P, Deng H-W: Hotelling\u2019s T2 multivariate profiling for detecting differential expression in microarrays. Bioinformatics. 2005, 21: 3105-3113. 10.1093\/bioinformatics\/bti496.","journal-title":"Bioinformatics"},{"key":"5367_CR20","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1093\/biostatistics\/4.4.555","volume":"4","author":"A Szabo","year":"2003","unstructured":"Szabo A, Boucher K, Jones D, Tsodikov AD, Klebanov LB, Yakovlev AY: Multivariate exploratory tools for microarray data analysis. Biostatistics. 2003, 4: 555-567. 10.1093\/biostatistics\/4.4.555.","journal-title":"Biostatistics"},{"key":"5367_CR21","doi-asserted-by":"crossref","first-page":"32","DOI":"10.2202\/1544-6115.1175","volume":"4","author":"J Schafer","year":"2005","unstructured":"Schafer J, Strimmer K: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist Appl Genet Mol Biol. 2005, 4: 32-","journal-title":"Statist Appl Genet Mol Biol"},{"key":"5367_CR22","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1055\/s-0038-1633982","volume":"44","author":"U Mansmann","year":"2005","unstructured":"Mansmann U, Meister R: Testing differential gene expression in functional groups: Goeman\u2019s global test versus an ANCOVA approach. Method Inform Med. 2005, 44: 449-453.","journal-title":"Method Inform Med"},{"key":"5367_CR23","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc B. 1995, 57: 289-300.","journal-title":"J Royal Stat Soc B"},{"key":"5367_CR24","doi-asserted-by":"publisher","first-page":"1480","DOI":"10.1007\/s11095-007-9266-8","volume":"24","author":"PJ Ferre","year":"2007","unstructured":"Ferre PJ, Liaubet L, Concordet D, SanCristobal M, Uro-Coste E, Tosser-Klopp G, Bonnet A, Toutain PL, Hatey F, Lefebvre HP: Longitudinal Analysis of Gene Expression in Porcine Skeletal Muscle After Post-Injection Local Injury. Pharm Res. 2007, 24: 1480-1489. 10.1007\/s11095-007-9266-8.","journal-title":"Pharm Res"},{"key":"5367_CR25","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1198\/016214502753479248","volume":"97","author":"S Dudoit","year":"2002","unstructured":"Dudoit S, Fridlyand J, Speed T: Comparison of Discimination Methods for the Classification of Tumors Using Gene Expression Data. J Am Stat Assoc. 2002, 97: 77-87. 10.1198\/016214502753479248.","journal-title":"J Am Stat Assoc"},{"key":"5367_CR26","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"B Efron","year":"1993","unstructured":"Efron B, Tibshirani R: An Introduction to the Bootstrap. 1993, Chapman & Hall\/CRC Monographs on Statistics & Applied Probability, New York, NY"},{"key":"5367_CR27","first-page":"45","volume":"64","author":"S Peddada","year":"2010","unstructured":"Peddada S, Harris S, Davidov O: Analysis of Correlated Gene Expression Data on Ordered Categories. J Ind Soc Agric Statist. 2010, 64: 45-60.","journal-title":"J Ind Soc Agric Statist"},{"key":"5367_CR28","doi-asserted-by":"publisher","first-page":"1019","DOI":"10.1093\/bioinformatics\/btp076","volume":"25","author":"R Heller","year":"2009","unstructured":"Heller R, Manduchi E, Grant GR, Ewens WJ: A flexible two-stage procedure for identifying gene sets that are differentially expressed. Bioinformatics. 2009, 25: 1019-1025. 10.1093\/bioinformatics\/btp076.","journal-title":"Bioinformatics"},{"key":"5367_CR29","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1093\/bioinformatics\/btg093","volume":"19","author":"SD Peddada","year":"2003","unstructured":"Peddada SD, Lobenhofer L, Li L, Afshari C, Weinberg C, Umbach D: Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics. 2003, 19: 834-841. 10.1093\/bioinformatics\/btg093.","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-177.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T12:18:23Z","timestamp":1714220303000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-177"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,24]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["5367"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-177","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,24]]},"assertion":[{"value":"9 December 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"177"}}