{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,3,18]],"date-time":"2024-03-18T10:19:45Z","timestamp":1710757185709},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2009,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Clustering analysis is a common statistical tool for knowledge discovery. It is mainly conducted when a project still is in the exploratory phase without any priori hypotheses. However, the statistical significance testing between the clusters can be meaningful in helping the researchers to assess if the classification results from implementing a clustering algorithm need to be improved, even after the cluster number has been determined by a well-established criterion. This is important when we want to identify highly-specific patterns through classification.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We proposed to use a principal component (PC) test, which is an implementation of an exact <jats:italic>F<\/jats:italic> statistic for the measures at multiple endpoints based on elliptical distribution theory, to assess the statistical significance between clusters. A challenge in the implementation is the choice of the number (q) of principal components to be considered, which can severely influence the statistical power of the method. We optimized the determination via validation according to a permutation test based on the clustering to be evaluated. The method was applied to a public dataset in classifying genes according to their temporal gene expression profiles.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The results demonstrated that the PC testing were useful for determining the optimal number of clusters.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-10-s1-s26","type":"journal-article","created":{"date-parts":[[2009,1,30]],"date-time":"2009-01-30T20:04:59Z","timestamp":1233345899000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Principal component tests: applied to temporal gene expression data"],"prefix":"10.1186","volume":"10","author":[{"given":"Wensheng","family":"Zhang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hong-Bin","family":"Fang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiuzhou","family":"Song","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2009,1,30]]},"reference":[{"key":"3209_CR1","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316801","volume-title":"Finding Groups in Data: An Introduction to Cluster Analysis","author":"L Kaufman","year":"1990","unstructured":"Kaufman L, Rousseeuw P: Finding Groups in Data: An Introduction to Cluster Analysis. 1990, Wiley, New York"},{"key":"3209_CR2","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1007\/BF02294245","volume":"50","author":"GW Milligan","year":"1985","unstructured":"Milligan GW, Cooper MC: An examination of procedures for determining number of clusters in a data set. Psychometrika. 1985, 50: 159-179. 10.1007\/BF02294245.","journal-title":"Psychometrika"},{"key":"3209_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/03610928308827180","volume":"3","author":"T Calinski","year":"1974","unstructured":"Calinski T, Harabasz J: A dendrite method for cluster analysis. Commun Statist. 1974, 3: 1-27. 10.1080\/03610927408827101.","journal-title":"Commun Statist"},{"key":"3209_CR4","volume-title":"SAS\/STAT User's Guider","author":"Institute S","year":"2002","unstructured":"Institute S: SAS\/STAT User's Guider. 2002"},{"key":"3209_CR5","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1198\/016214502760047131","volume":"97","author":"C Fraley","year":"2002","unstructured":"Fraley C, Raftery AE: Model-Based Clustering, Discriminant Analysis, and Density Estimation. Journal of the American Statistical Association. 2002, 97: 611-631. 10.1198\/016214502760047131.","journal-title":"Journal of the American Statistical Association"},{"issue":"4","key":"3209_CR6","doi-asserted-by":"publisher","first-page":"474","DOI":"10.1093\/bioinformatics\/btg014","volume":"19","author":"Y Luan","year":"2003","unstructured":"Luan Y, Li H: Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics. 2003, 19 (4): 474-482. 10.1093\/bioinformatics\/btg014.","journal-title":"Bioinformatics"},{"issue":"4","key":"3209_CR7","doi-asserted-by":"publisher","first-page":"1261","DOI":"10.1093\/nar\/gkl013","volume":"34","author":"P Ma","year":"2006","unstructured":"Ma P, Castillo-Davis CI, Zhong W, Liu JS: A data-driven clustering method for time course gene expression data. Nucleic Acids Res. 2006, 34 (4): 1261-1269. 10.1093\/nar\/gkl013.","journal-title":"Nucleic Acids Res"},{"issue":"10","key":"3209_CR8","doi-asserted-by":"publisher","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","volume":"17","author":"KY Yeung","year":"2001","unstructured":"Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL: Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001, 17 (10): 977-987. 10.1093\/bioinformatics\/17.10.977.","journal-title":"Bioinformatics"},{"issue":"3","key":"3209_CR9","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1093\/bioinformatics\/18.3.413","volume":"18","author":"GJ McLachlan","year":"2002","unstructured":"McLachlan GJ, Bean RW, Peel D: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics. 2002, 18 (3): 413-422. 10.1093\/bioinformatics\/18.3.413.","journal-title":"Bioinformatics"},{"key":"3209_CR10","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987, 20: 53-65. 10.1016\/0377-0427(87)90125-7.","journal-title":"Journal of Computational and Applied Mathematics"},{"key":"3209_CR11","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","volume":"63","author":"R Tibshirani","year":"2001","unstructured":"Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a dataset via the Gap statistic. Journal of the Royal Statistical Society B. 2001, 63: 411-423. 10.1111\/1467-9868.00293.","journal-title":"Journal of the Royal Statistical Society B"},{"key":"3209_CR12","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1186\/1471-2105-4-36","volume":"4","author":"M Smolkin","year":"2003","unstructured":"Smolkin M, Ghosh D: Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics. 2003, 4: 36-10.1186\/1471-2105-4-36.","journal-title":"BMC Bioinformatics"},{"key":"3209_CR13","first-page":"241","volume":"12","author":"X Chen","year":"2002","unstructured":"Chen X, Jaradat SA, Banerjee N, Tanaka TS, Ko MSH, Zhang MQ: Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica. 2002, 12: 241-262.","journal-title":"Statistica Sinica"},{"issue":"Suppl 4","key":"3209_CR14","doi-asserted-by":"publisher","first-page":"S17","DOI":"10.1186\/1471-2105-7-S4-S17","volume":"7","author":"S Datta","year":"2006","unstructured":"Datta S, Datta S: Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics. 2006, 7 (Suppl 4): S17-10.1186\/1471-2105-7-S4-S17.","journal-title":"BMC Bioinformatics"},{"issue":"4","key":"3209_CR15","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1145\/155775.155781","volume":"20","author":"KEE Raatikainen","year":"1993","unstructured":"Raatikainen KEE: Cluster analysis and workload classification. Performance Evaluation Review. 1993, 20 (4): 24-30. 10.1145\/155775.155781.","journal-title":"Performance Evaluation Review"},{"issue":"19","key":"3209_CR16","doi-asserted-by":"publisher","first-page":"2405","DOI":"10.1093\/bioinformatics\/btl406","volume":"22","author":"A Thalamuthu","year":"2006","unstructured":"Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics. 2006, 22 (19): 2405-2412. 10.1093\/bioinformatics\/btl406.","journal-title":"Bioinformatics"},{"key":"3209_CR17","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","volume":"66","author":"WM Rand","year":"1971","unstructured":"Rand WM: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. 1971, 66: 846-856. 10.2307\/2284239.","journal-title":"Journal of the American Statistical Association"},{"issue":"4","key":"3209_CR18","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1093\/bioinformatics\/17.4.309","volume":"17","author":"KY Yeung","year":"2001","unstructured":"Yeung KY, Haynor DR, Ruzzo WL: Validating clustering for gene expression data. Bioinformatics. 2001, 17 (4): 309-318. 10.1093\/bioinformatics\/17.4.309.","journal-title":"Bioinformatics"},{"issue":"25","key":"3209_CR19","doi-asserted-by":"publisher","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","volume":"95","author":"MB Eisen","year":"1998","unstructured":"Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95 (25): 14863-14868. 10.1073\/pnas.95.25.14863.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"3","key":"3209_CR20","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1038\/ng906","volume":"31","author":"LF Wu","year":"2002","unstructured":"Wu LF, Hughes TR, Davierwala AP, Robinson MD, Stoughton R, Altschuler SJ: Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet. 2002, 31 (3): 255-265. 10.1038\/ng906.","journal-title":"Nat Genet"},{"issue":"5","key":"3209_CR21","doi-asserted-by":"publisher","first-page":"965","DOI":"10.1101\/gr.1144503","volume":"13","author":"A Lagreid","year":"2003","unstructured":"Lagreid A, Hvidsten TR, Midelfart H, Komorowski J, Sandvik AK: Predicting gene ontology biological process from temporal gene expression patterns. Genome Res. 2003, 13 (5): 965-979. 10.1101\/gr.1144503.","journal-title":"Genome Res"},{"key":"3209_CR22","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1214\/aoms\/1177732979","volume":"2","author":"H Hotelling","year":"1931","unstructured":"Hotelling H: The generalization of Student's ratio. Ann Math Statist. 1931, 2: 360-378. 10.1214\/aoms\/1177732979.","journal-title":"Ann Math Statist"},{"key":"3209_CR23","doi-asserted-by":"publisher","first-page":"964","DOI":"10.2307\/2533057","volume":"52","author":"J Lauter","year":"1995","unstructured":"Lauter J: Exact t and F tests for analyzing studies with multiple endpoints. Biometrics. 1995, 52: 964-970. 10.2307\/2533057.","journal-title":"Biometrics"},{"key":"3209_CR24","doi-asserted-by":"publisher","first-page":"1079","DOI":"10.2307\/2531158","volume":"40","author":"PC O'Brien","year":"1985","unstructured":"O'Brien PC: Procedures for comparing samples with multiple endpoints. Biometrics. 1985, 40: 1079-1087. 10.2307\/2531158.","journal-title":"Biometrics"},{"issue":"5398","key":"3209_CR25","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1126\/science.283.5398.83","volume":"283","author":"VR Iyer","year":"1999","unstructured":"Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson J, Boguski MS: The transcriptional program in the response of human fibroblasts to serum. Science. 1999, 283 (5398): 83-87. 10.1126\/science.283.5398.83.","journal-title":"Science"},{"issue":"9","key":"3209_CR26","doi-asserted-by":"publisher","first-page":"2129","DOI":"10.1101\/gr.772403","volume":"13","author":"PD Thomas","year":"2003","unstructured":"Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A: PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003, 13 (9): 2129-2141. 10.1101\/gr.772403.","journal-title":"Genome Res"},{"key":"3209_CR27","volume-title":"Term-tissue specific models for prediction of gene ontology biological processes using transcriptional profiles of aging in D. Melanogaster","author":"W Zhang","year":"2007","unstructured":"Zhang W, Song JZ: Term-tissue specific models for prediction of gene ontology biological processes using transcriptional profiles of aging in D. Melanogaster. 2007"},{"key":"3209_CR28","volume-title":"Generalized multivariate analysis","author":"K-T Fang","year":"1990","unstructured":"Fang K-T, Zhang J: Generalized multivariate analysis. 1990, Berlin, Heidelberg; Science Press Beijing and Springer-Verlag"},{"issue":"1","key":"3209_CR29","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1111\/j.0006-341X.2001.00253.x","volume":"57","author":"JA Rice","year":"2001","unstructured":"Rice JA, Wu CO: Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics. 2001, 57 (1): 253-259. 10.1111\/j.0006-341X.2001.00253.x.","journal-title":"Biometrics"},{"key":"3209_CR30","volume-title":"Smoothing Spline ANOVA Models","author":"C Gu","year":"2000","unstructured":"Gu C: Smoothing Spline ANOVA Models. 2000, Springer-Verlag"},{"issue":"1","key":"3209_CR31","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"M Ashburner","year":"2000","unstructured":"Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038\/75556.","journal-title":"Nat Genet"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-10-S1-S26.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T10:50:44Z","timestamp":1630493444000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-10-S1-S26"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,1]]},"references-count":31,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2009,1]]}},"alternative-id":["3209"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-10-s1-s26","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,1]]},"assertion":[{"value":"30 January 2009","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S26"}}