{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T10:07:05Z","timestamp":1760954825722},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and\/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset). We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.). Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It has proven especially valuable in instances where there are many diverse conditions (10's to hundreds of different tissues or cell types), a situation in which many clustering and ordering algorithms become problematic. This approach also shows promise in other topic domains such as multi-spectral imaging datasets.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-194","type":"journal-article","created":{"date-parts":[[2006,4,20]],"date-time":"2006-04-20T14:29:34Z","timestamp":1145543374000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":53,"title":["Mining gene expression data by interpreting principal components"],"prefix":"10.1186","volume":"7","author":[{"given":"Joseph C","family":"Roden","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brandon W","family":"King","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Diane","family":"Trout","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ali","family":"Mortazavi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Barbara J","family":"Wold","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher E","family":"Hart","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,4,7]]},"reference":[{"issue":"3","key":"933_CR1","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1038\/10343","volume":"22","author":"S Tavazoie","year":"1999","unstructured":"Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281\u2013285. 10.1038\/10343","journal-title":"Nat Genet"},{"issue":"6","key":"933_CR2","doi-asserted-by":"publisher","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","volume":"96","author":"P Tamayo","year":"1999","unstructured":"Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E, Golub T: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 1999, 96(6):2907\u20132912. 10.1073\/pnas.96.6.2907","journal-title":"Proc Natl Acad Sci USA"},{"issue":"25","key":"933_CR3","doi-asserted-by":"publisher","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","volume":"95","author":"M Eisen","year":"1998","unstructured":"Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95(25):14863\u201314868. 10.1073\/pnas.95.25.14863","journal-title":"Proc Natl Acad Sci USA"},{"key":"933_CR4","doi-asserted-by":"publisher","first-page":"692","DOI":"10.1016\/S0743-7315(03)00085-6","volume":"63","author":"R Wang","year":"2003","unstructured":"Wang R, Scharenbroich L, Hart C, Wold B, Mjolsness E: Clustering analysis of microarray gene expression data by splitting algorithm. J Parallel Distrib Comput 2003, 63: 692\u2013706. 10.1016\/S0743-7315(03)00085-6","journal-title":"J Parallel Distrib Comput"},{"issue":"10","key":"933_CR5","doi-asserted-by":"publisher","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","volume":"17","author":"KY Yeung","year":"2001","unstructured":"Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL: Model-based clustering and data transformations for gene expression data. Bioinformatics 2001, 17(10):977\u2013987. 10.1093\/bioinformatics\/17.10.977","journal-title":"Bioinformatics"},{"issue":"12","key":"933_CR6","doi-asserted-by":"publisher","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","volume":"96","author":"U Alon","year":"1999","unstructured":"Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999, 96(12):6745\u20136750. 10.1073\/pnas.96.12.6745","journal-title":"Proc Natl Acad Sci USA"},{"key":"933_CR7","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1038\/35076576","volume":"2","author":"J Quackenbush","year":"2001","unstructured":"Quackenbush J: Computational Analysis of Microarray Data. Nature Reviews Genetics 2001, 2: 418\u2013427. 10.1038\/35076576","journal-title":"Nature Reviews Genetics"},{"issue":"Suppl","key":"933_CR8","doi-asserted-by":"publisher","first-page":"502","DOI":"10.1038\/ng1033","volume":"32","author":"DK Slonim","year":"2002","unstructured":"Slonim DK: From patterns to pathways: gene expression data analysis comes of age. Nat Genet 2002, 32(Suppl):502\u20138. 10.1038\/ng1033","journal-title":"Nat Genet"},{"key":"933_CR9","doi-asserted-by":"crossref","unstructured":"Hart CE, Sharenbroich L, Bornstein BJ, Trout D, King B, Mjolsness E, Wold BJ: A Mathematical and computational framework for quantitative comparison and integration of large scale gene expression data. Nucleic Acids Research 33(8):2580\u20132594. 2005, May 10 10.1093\/nar\/gki536","DOI":"10.1093\/nar\/gki536"},{"key":"933_CR10","volume-title":"Inferring Genetic Regulatory Network Structure: Integrative Analysis of Genome-Scale Data","author":"CE Hart","year":"2005","unstructured":"Hart CE: Inferring Genetic Regulatory Network Structure: Integrative Analysis of Genome-Scale Data. PhD Thesis, California Institute of Technology; 2005."},{"key":"933_CR11","doi-asserted-by":"crossref","unstructured":"Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97(1):262\u2013267. 2000, January 4 10.1073\/pnas.97.1.262","DOI":"10.1073\/pnas.97.1.262"},{"key":"933_CR12","doi-asserted-by":"crossref","unstructured":"Mjolsness E, DeCoste D: Machine learning for science: state of the art and future prospects. Science 293(5537):2051\u20132055. 2001 Sep 14 10.1126\/science.293.5537.2051","DOI":"10.1126\/science.293.5537.2051"},{"key":"933_CR13","doi-asserted-by":"crossref","unstructured":"Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS (26):15149\u201315154. 2001, Dec 18","DOI":"10.1073\/pnas.211566398"},{"key":"933_CR14","doi-asserted-by":"crossref","unstructured":"Tothill RW, Kowalczyk A, Rischin D, Bousioutas A, Haviv I, van Laar RK, Waring PM, Zalcberg J, Ward R, Biankin AV, Sutherland RL, Henshall SM, Fong K, Pollack JR, Bowtell DDL, Holloway AJ: An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Research 65(10):4031\u20134040. 2005, May 15 10.1158\/0008-5472.CAN-04-3617","DOI":"10.1158\/0008-5472.CAN-04-3617"},{"issue":"4","key":"933_CR15","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1038\/ng941","volume":"31","author":"J Ihmels","year":"2002","unstructured":"Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network. Nat Genet 2002, 31(4):370\u2013377.","journal-title":"Nat Genet"},{"issue":"3 Pt 1","key":"933_CR16","doi-asserted-by":"publisher","first-page":"031902","DOI":"10.1103\/PhysRevE.67.031902","volume":"67","author":"S Bergmann","year":"2003","unstructured":"Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 2003, 67(3 Pt 1):031902.","journal-title":"Phys Rev E Stat Nonlin Soft Matter Phys"},{"issue":"9","key":"933_CR17","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1093\/bioinformatics\/17.9.763","volume":"17","author":"KY Yeung","year":"2001","unstructured":"Yeung KY, Ruzzo WL: Principal component analysis for clustering gene expression data. Bioinformatics 2001, 17(9):763\u2013774. 10.1093\/bioinformatics\/17.9.763","journal-title":"Bioinformatics"},{"key":"933_CR18","doi-asserted-by":"crossref","unstructured":"Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS: Classification and diagnostic prediction of concers using gene expression profiling and artificial reural networks. Nat Med 2001, (7):673\u2013679. 10.1038\/89044","DOI":"10.1038\/89044"},{"issue":"1","key":"933_CR19","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1093\/bioinformatics\/18.1.39","volume":"18","author":"D Nguyen","year":"2002","unstructured":"Nguyen D, Rocke D: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 2002, 18(1):39\u201350. 10.1093\/bioinformatics\/18.1.39","journal-title":"Bioinformatics"},{"key":"933_CR20","doi-asserted-by":"crossref","unstructured":"Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R: Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci U S A 95(1):334\u2013339. 1998, January 6 10.1073\/pnas.95.1.334","DOI":"10.1073\/pnas.95.1.334"},{"issue":"1","key":"933_CR21","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1093\/bioinformatics\/18.1.207","volume":"18","author":"A Sturn","year":"2002","unstructured":"Sturn A, Quackenbush J, Trajanoski Z: Genesis: cluster analysis of microarray data. Bioinformatics application note 2002, 18(1):207\u2013208.","journal-title":"Bioinformatics application note"},{"key":"933_CR22","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1093\/jnci\/91.5.453","volume":"91","author":"SG Hilsenbeck","year":"1999","unstructured":"Hilsenbeck SG, Friedrichs WE, Schiff R, O'Connell P, Hansen RK, Osborne CK, Fuqua SAW: Statistical Analysis of Array Expression Data as Applied to the Problem of Tamoxifen Resistance. J Natl Cancer Institute 1999, 91: 453\u2013459. 10.1093\/jnci\/91.5.453","journal-title":"J Natl Cancer Institute"},{"key":"933_CR23","first-page":"455","volume-title":"Pac Symp Biocomput","author":"S Raychaudhuri","year":"2000","unstructured":"Raychaudhuri S, Stuart JM, Altman RB: Principal Components Analysis to Summarize Microarray Experiments: Application to Sporulation Time Series. Pac Symp Biocomput 2000, 455\u2013466."},{"key":"933_CR24","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1126\/science.282.5389.699","volume":"282","author":"S Chu","year":"1998","unstructured":"Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I: The transcriptional program of sporulation in budding yeast. Science 1998, 282: 699\u2013705. 10.1126\/science.282.5389.699","journal-title":"Science"},{"key":"933_CR25","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1093\/bioinformatics\/17.6.566","volume":"17","author":"ME Wall","year":"2001","unstructured":"Wall ME, Dyck PA, Brettin TS: SVDMAN \u2013 Singular value decomposition analysis of microarray data. Bioinformatics 2001, 17: 566\u2013568. 10.1093\/bioinformatics\/17.6.566","journal-title":"Bioinformatics"},{"key":"933_CR26","doi-asserted-by":"crossref","unstructured":"Selaru FM, Yin J, Olaru A, Mori Y, Xu Y, Epstein SH, Sato F, Deacu E, Wang S, Sterian A, Fulton A, Abraham JM, Shibata D, Baquet C, Stass SA, Meltzer SJ: An Unsupervised Approach to Identify Molecular Phenotypic Components Influencing Breast Cancer Features. Cancer Research (64):1584\u20131588. 2004, March 1","DOI":"10.1158\/0008-5472.CAN-03-3208"},{"key":"933_CR27","unstructured":"The CompClust software package[http:\/\/woldlab.caltech.edu\/compclust]"},{"issue":"3","key":"933_CR28","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1007\/BF01617722","volume":"11","author":"AD Forbes","year":"1995","unstructured":"Forbes AD: Classification-algorithm evaluation: five performance measures based on confusion matrices. J Clin Monit 1995, 11(3):189\u2013206. 10.1007\/BF01617722","journal-title":"J Clin Monit"},{"key":"933_CR29","unstructured":"The CompClustWeb software demonstration[http:\/\/woldlab.caltech.edu\/publications\/pca-bmc-2005\/demo]"},{"key":"933_CR30","unstructured":"Matplotlib\/pylab \u2013 matlab style python plotting (plots, graphs, charts)[http:\/\/matplotlib.sourceforge.net]"},{"key":"933_CR31","unstructured":"RPy home page[http:\/\/rpy.sourceforge.net]"},{"key":"933_CR32","unstructured":"Gary Strangman's Python Modules[http:\/\/www.nmr.mgh.harvard.edu\/Neural_Systems_Group\/gary\/python.html]"},{"key":"933_CR33","doi-asserted-by":"crossref","unstructured":"HG_U133A\/GNF1H and GNF1M Tissue Atlas Datasets, Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101(16):6062\u20136067. 2004, Apr 20","DOI":"10.1073\/pnas.0400782101"},{"key":"933_CR34","unstructured":"The GNF SymAtlas web application[http:\/\/symatlas.gnf.org\/SymAtlas]"},{"key":"933_CR35","unstructured":"Supplemental materials web site[http:\/\/woldlab.caltech.edu\/publications\/pca-bmc-2005]"},{"key":"933_CR36","unstructured":"Mortazavi and Wold, in preparation"},{"issue":"3","key":"933_CR37","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1038\/ng1180","volume":"34","author":"VK Mootha","year":"2003","unstructured":"Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003, 34(3):267\u2013273. 10.1038\/ng1180","journal-title":"Nat Genet"},{"key":"933_CR38","unstructured":"Broad Institute Cancer Program dataset repository[http:\/\/www.broad.mit.edu\/cgi-bin\/cancer\/datasets.cgi]"},{"issue":"11","key":"933_CR39","doi-asserted-by":"publisher","first-page":"1454","DOI":"10.1093\/bioinformatics\/18.11.1454","volume":"18","author":"OG Troyanskaya","year":"2002","unstructured":"Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB: Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002, 18(11):1454\u20131461. 10.1093\/bioinformatics\/18.11.1454","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-194.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T10:59:02Z","timestamp":1630493942000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-194"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,4,7]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["933"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-194","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,4,7]]},"assertion":[{"value":"3 July 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"194"}}