{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T11:35:00Z","timestamp":1774265700113,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1748-7188-6-14","type":"journal-article","created":{"date-parts":[[2011,5,31]],"date-time":"2011-05-31T18:17:14Z","timestamp":1306865834000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["A robust approach based on Weibull distribution for clustering gene expression data"],"prefix":"10.1186","volume":"6","author":[{"given":"Huakun","family":"Wang","sequence":"first","affiliation":[]},{"given":"Zhenzhen","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Xia","family":"Li","sequence":"additional","affiliation":[]},{"given":"Binsheng","family":"Gong","sequence":"additional","affiliation":[]},{"given":"Lixin","family":"Feng","sequence":"additional","affiliation":[]},{"given":"Ying","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,5,31]]},"reference":[{"key":"127_CR1","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1038\/73432","volume":"24","author":"DT Ross","year":"2000","unstructured":"Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO: Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 2000, 24: 227-235. 10.1038\/73432","journal-title":"Nat Genet"},{"key":"127_CR2","doi-asserted-by":"publisher","first-page":"1439","DOI":"10.1517\/13543784.7.9.1439","volume":"7","author":"J Schlom","year":"1998","unstructured":"Schlom J, Tsang KY, Kantor JA, Abrams SI, Zaremba S, Greiner J, Hodge JW: Cancer vaccine development. Expert Opin Investig Drugs. 1998, 7: 1439-1452. 10.1517\/13543784.7.9.1439","journal-title":"Expert Opin Investig Drugs"},{"key":"127_CR3","doi-asserted-by":"publisher","first-page":"1268","DOI":"10.1126\/science.276.5316.1268","volume":"276","author":"L Zhang","year":"1997","unstructured":"Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW: Gene expression profiles in normal and cancer cells. Science. 1997, 276: 1268-1272. 10.1126\/science.276.5316.1268","journal-title":"Science"},{"key":"127_CR4","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1586\/14737159.5.6.843","volume":"5","author":"A Khademhosseini","year":"2005","unstructured":"Khademhosseini A: Chips to Hits: microarray and microfluidic technologies for high-throughput analysis and drug discovery. September 12-15, 2005, MA, USA. Expert Rev Mol Diagn. 2005, 5: 843-846. 10.1586\/14737159.5.6.843","journal-title":"Expert Rev Mol Diagn"},{"key":"127_CR5","first-page":"M17","volume":"1423","author":"J Khan","year":"1999","unstructured":"Khan J, Bittner ML, Chen Y, Meltzer PS, Trent JM: DNA microarray technology: the anticipated impact on the study of human disease. Biochim Biophys Acta. 1999, 1423: M17-28.","journal-title":"Biochim Biophys Acta"},{"key":"127_CR6","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1016\/S0958-1669(98)80138-9","volume":"9","author":"A Watson","year":"1998","unstructured":"Watson A, Mazumder A, Stewart M, Balasubramanian S: Technology for microarray analysis of gene expression. Curr Opin Biotechnol. 1998, 9: 609-614. 10.1016\/S0958-1669(98)80138-9","journal-title":"Curr Opin Biotechnol"},{"key":"127_CR7","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1089\/106652799318274","volume":"6","author":"A Ben-Dor","year":"1999","unstructured":"Ben-Dor A, Shamir R, Yakhini Z: Clustering gene expression patterns. J Comput Biol. 1999, 6: 281-297. 10.1089\/106652799318274","journal-title":"J Comput Biol"},{"key":"127_CR8","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1097\/00004691-200203000-00005","volume":"19","author":"MJ Guess","year":"2002","unstructured":"Guess MJ, Wilson SB: Introduction to hierarchical clustering. J Clin Neurophysiol. 2002, 19: 144-151. 10.1097\/00004691-200203000-00005","journal-title":"J Clin Neurophysiol"},{"key":"127_CR9","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1055\/s-0038-1633991","volume":"44","author":"J Rahnenfuhrer","year":"2005","unstructured":"Rahnenfuhrer J: Clustering algorithms and other exploratory methods for microarray data analysis. Methods Inf Med. 2005, 44: 444-448.","journal-title":"Methods Inf Med"},{"key":"127_CR10","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1093\/bib\/6.4.331","volume":"6","author":"PC Boutros","year":"2005","unstructured":"Boutros PC, Okey AB: Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data. Brief Bioinform. 2005, 6: 331-343. 10.1093\/bib\/6.4.331","journal-title":"Brief Bioinform"},{"key":"127_CR11","doi-asserted-by":"publisher","first-page":"2537","DOI":"10.1162\/089976600300014836","volume":"12","author":"A Sierra","year":"2000","unstructured":"Sierra A, Corbacho F: Reclassification as supervised clustering. Neural Comput. 2000, 12: 2537-2546. 10.1162\/089976600300014836","journal-title":"Neural Comput"},{"key":"127_CR12","first-page":"281","volume-title":"the 5th Berkeley Symposium on Mathematical Statistics and Probability","author":"JB MacQueen","year":"1967","unstructured":"MacQueen JB: Some Methods for classification and Analysis of Multivariate Observations. the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281-297. University of California Press"},{"key":"127_CR13","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1007\/BF02348081","volume":"41","author":"B Gourevitch","year":"2003","unstructured":"Gourevitch B, Le Bouquin-Jeannes R: K-means clustering method for auditory evoked potentials selection. Med Biol Eng Comput. 2003, 41: 397-402. 10.1007\/BF02348081","journal-title":"Med Biol Eng Comput"},{"key":"127_CR14","doi-asserted-by":"publisher","first-page":"1149","DOI":"10.1016\/j.neunet.2004.07.010","volume":"17","author":"M Cottrell","year":"2004","unstructured":"Cottrell M, Ibbou S, Letremy P: SOM-based algorithms for qualitative variables. Neural Netw. 2004, 17: 1149-1167. 10.1016\/j.neunet.2004.07.010","journal-title":"Neural Netw"},{"key":"127_CR15","doi-asserted-by":"publisher","first-page":"3367","DOI":"10.1016\/j.watres.2006.07.027","volume":"40","author":"BH Lee","year":"2006","unstructured":"Lee BH, Scholz M: Application of the self-organizing map (SOM) to assess the heavy metal removal performance in experimental constructed wetlands. Water Res. 2006, 40: 3367-3374. 10.1016\/j.watres.2006.07.027","journal-title":"Water Res"},{"key":"127_CR16","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1115\/1.4010337","volume":"18","author":"W Weibull","year":"1951","unstructured":"Weibull W: A statistical distribution function of wide applicability. J Appl Mech-Trans ASME. 1951, 18: 293-297.","journal-title":"J Appl Mech-Trans ASME"},{"key":"127_CR17","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1111\/j.2517-6161.1976.tb01597.x","volume":"38","author":"BW Turnbull","year":"1976","unstructured":"Turnbull BW: The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society Series B. 1976, 38: 290-295.","journal-title":"Journal of the Royal Statistical Society Series B"},{"key":"127_CR18","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1080\/01621459.1951.10500769","volume":"46","author":"J Frank","year":"1951","unstructured":"Frank J, Massey J: The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association. 1951, 46: 68-78. 10.2307\/2280095","journal-title":"Journal of the American Statistical Association"},{"key":"127_CR19","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1089\/adt.2007.071","volume":"5","author":"S Huang","year":"2007","unstructured":"Huang S, Yeo AA, Li SD: Modification of Kolmogorov-Smirnov test for DNA content data analysis through distribution alignment. Assay Drug Dev Technol. 2007, 5: 663-671. 10.1089\/adt.2007.071","journal-title":"Assay Drug Dev Technol"},{"key":"127_CR20","first-page":"376","volume":"14","author":"LD Ong","year":"1968","unstructured":"Ong LD, LeClare PC: The Kolmogorov-Smirnov test for the log-normality of sample cumulative frequency distributions. Health Phys. 1968, 14: 376-","journal-title":"Health Phys"},{"key":"127_CR21","volume-title":"The Mathematics Teacher","author":"R Clason","year":"1990","unstructured":"Clason R: Finding Clusters: An application of the Distance Concept. The Mathematics Teacher. 1990"},{"key":"127_CR22","volume-title":"Curr Protoc Bioinformatics","author":"JA Blake","year":"2008","unstructured":"Blake JA, Harris MA: The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis. Curr Protoc Bioinformatics. 2008, 7: Unit 7 2"},{"key":"127_CR23","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1038\/nprot.2008.211","volume":"4","author":"W Huang da","year":"2009","unstructured":"Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57.","journal-title":"Nat Protoc"},{"key":"127_CR24","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1093\/bioinformatics\/17.4.309","volume":"17","author":"KY Yeung","year":"2001","unstructured":"Yeung KY, Haynor DR, Ruzzo WL: Validating clustering for gene expression data. Bioinformatics. 2001, 17: 309-318. 10.1093\/bioinformatics\/17.4.309","journal-title":"Bioinformatics"},{"key":"127_CR25","volume-title":"Statistical Indexes for Computational and Data Driven Class Discovery in Microarray Data. In Biological Data Mining","author":"DS R Giancarlo","year":"2009","unstructured":"R Giancarlo DS, Utro F: Statistical Indexes for Computational and Data Driven Class Discovery in Microarray Data. In Biological Data Mining. 2009, Chapman and Hall"},{"issue":"Suppl 12","key":"127_CR26","doi-asserted-by":"publisher","first-page":"S8","DOI":"10.1186\/1471-2105-10-S12-S8","volume":"10","author":"E Mosca","year":"2009","unstructured":"Mosca E, Bertoli G, Piscitelli E, Vilardo L, Reinbold RA, Zucchi I, Milanesi L: Identification of functionally related genes using data mining and data integration: a breast cancer case study. BMC Bioinformatics. 2009, 10 (Suppl 12): S8- 10.1186\/1471-2105-10-S12-S8","journal-title":"BMC Bioinformatics"},{"key":"127_CR27","doi-asserted-by":"publisher","first-page":"13790","DOI":"10.1073\/pnas.191502998","volume":"98","author":"A Bhattacharjee","year":"2001","unstructured":"Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001, 98: 13790-13795. 10.1073\/pnas.191502998","journal-title":"Proc Natl Acad Sci USA"},{"key":"127_CR28","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1038\/nm0102-68","volume":"8","author":"MA Shipp","year":"2002","unstructured":"Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002, 8: 68-74. 10.1038\/nm0102-68","journal-title":"Nat Med"},{"key":"127_CR29","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1038\/ng1061","volume":"33","author":"L Dyrskjot","year":"2003","unstructured":"Dyrskjot L, Thykjaer T, Kruhoffer M, Jensen JL, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft TF: Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet. 2003, 33: 90-96. 10.1038\/ng1061","journal-title":"Nat Genet"}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1748-7188-6-14.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T22:34:31Z","timestamp":1741214071000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/1748-7188-6-14"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5,31]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["127"],"URL":"https:\/\/doi.org\/10.1186\/1748-7188-6-14","relation":{},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,5,31]]},"assertion":[{"value":"24 December 2010","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"14"}}