{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,19]],"date-time":"2025-10-19T18:16:10Z","timestamp":1760897770170},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Clustering methods are widely used on gene expression data to categorize genes with similar expression profiles. Finding an appropriate (dis)similarity measure is critical to the analysis. In our study, we developed a new measure for clustering the genes when the key factor is the shape of the profile, and when the expression magnitude should also be accounted for in determining the gene relationship. This is achieved by modeling the shape and magnitude parameters separately in a gene expression profile, and then using the estimated shape and magnitude parameters to define a measure in a new feature space.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We explored several different transformation schemes to construct the feature spaces that include a space whose features are determined by the mutual differences of the original expression components, a space derived from a parametric covariance matrix, and the principal component space in traditional PCA analysis. The former two are the newly proposed and the latter is explored for comparison purposes. The new measures we defined in these feature spaces were employed in a <jats:italic>K<\/jats:italic>-means clustering procedure to perform analyses. Applying these algorithms to a simulation dataset, a developing mouse retina SAGE dataset, a small yeast sporulation cDNA dataset, and a maize root affymetrix microarray dataset, we found from the results that the algorithm associated with the first feature space, named <jats:italic>TransChisq<\/jats:italic>, showed clear advantages over other methods.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The proposed <jats:italic>TransChisq<\/jats:italic> is very promising in capturing meaningful gene expression clusters. This study also demonstrates the importance of data transformations in defining an efficient distance measure. Our method should provide new insights in analyzing gene expression data. The clustering algorithms are available upon request.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-8-29","type":"journal-article","created":{"date-parts":[[2007,1,27]],"date-time":"2007-01-27T07:15:55Z","timestamp":1169882155000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Measuring similarities between gene expression profiles through new data transformations"],"prefix":"10.1186","volume":"8","author":[{"given":"Kyungpil","family":"Kim","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shibo","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Keni","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li","family":"Cai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"In-Beum","family":"Lee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lewis J","family":"Feldman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haiyan","family":"Huang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2007,1,27]]},"reference":[{"key":"1401_CR1","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/S0014-5793(00)01772-5","volume":"480","author":"A Brazma","year":"2000","unstructured":"Brazma A, Vilo J: Gene expression data analysis. FEES Lett 2000, 480: 17\u201324. 10.1016\/S0014-5793(00)01772-5","journal-title":"FEES Lett"},{"key":"1401_CR2","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1038\/35076576","volume":"2","author":"J Quackenbush","year":"2001","unstructured":"Quackenbush J: Computational analysis of microarray data. Nat Rev Genet 2001, 2: 418\u2013427. 10.1038\/35076576","journal-title":"Nat Rev Genet"},{"key":"1401_CR3","doi-asserted-by":"publisher","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","volume":"95","author":"MB Eisen","year":"1998","unstructured":"Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95: 14863\u201314868. 10.1073\/pnas.95.25.14863","journal-title":"Proc Natl Acad Sci USA"},{"key":"1401_CR4","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/BF02289588","volume":"2","author":"SC Johnson","year":"1967","unstructured":"Johnson SC: Hierarchical Clustering Schemes. Psychometrika 1967, 2: 241\u2013254. 10.1007\/BF02289588","journal-title":"Psychometrika"},{"key":"1401_CR5","volume-title":"Clustering algorithms","author":"JA Hartigan","year":"1975","unstructured":"Hartigan JA: Clustering algorithms. New York: John Wiley & Sons, Inc; 1975."},{"key":"1401_CR6","doi-asserted-by":"publisher","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","volume":"96","author":"P Tamayo","year":"1999","unstructured":"Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 1999, 96: 2907\u20132912. 10.1073\/pnas.96.6.2907","journal-title":"Proc Natl Acad Sci USA"},{"key":"1401_CR7","volume-title":"Mixture models: inference and applications to clustering","author":"GJ McLachlan","year":"1988","unstructured":"McLachlan GJ, Basford KE: Mixture models: inference and applications to clustering. New York: Dekker; 1988."},{"key":"1401_CR8","doi-asserted-by":"publisher","first-page":"803","DOI":"10.2307\/2532201","volume":"49","author":"JD Banfield","year":"1993","unstructured":"Banfield JD, Raftery AE: Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 49: 803\u2013821. 10.2307\/2532201","journal-title":"Biometrics"},{"key":"1401_CR9","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1198\/016214502760047131","volume":"97","author":"C Fraley","year":"2002","unstructured":"Fraley C, Raftery AE: Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 2002, 97: 611\u2013631. 10.1198\/016214502760047131","journal-title":"Journal of the American Statistical Association"},{"key":"1401_CR10","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","volume":"63","author":"R Tibshirani","year":"2001","unstructured":"Tibshirani R, Walther Q, Hastie T: Estimating the number of clusters in a data set via the gap statistic. J R Statist Soc B 2001, 63: 411\u2013423. 10.1111\/1467-9868.00293","journal-title":"J R Statist Soc B"},{"key":"1401_CR11","doi-asserted-by":"publisher","first-page":"810","DOI":"10.1021\/ci0200671","volume":"43","author":"M Feher","year":"2003","unstructured":"Feher M, Schmidt JM: Fuzzy clustering as a means of selecting representative conformers and molecular alignments. J Chem Inf Comput Sci 2003, 43: 810\u2013818. 10.1021\/ci0200671","journal-title":"J Chem Inf Comput Sci"},{"key":"1401_CR12","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1016\/j.artmed.2005.02.007","volume":"35","author":"Y Okada","year":"2005","unstructured":"Okada Y, Sahara T, Mitsubayashi H, Ohgiya S, Nagashima T: Knowledge-assisted recognition of cluster boundaries in gene expression data. Artif Intell Med 2005, 35: 171\u2013183. 10.1016\/j.artmed.2005.02.007","journal-title":"Artif Intell Med"},{"key":"1401_CR13","first-page":"1081","volume":"3","author":"F Baccelli","year":"1999","unstructured":"Baccelli F, Kofman D, Rougier JL: Self organizing hierarchical multicast trees and their optimization. Proceedings of IEEE Inforcom'99 1999, 3: 1081\u20131089.","journal-title":"Proceedings of IEEE Inforcom'99"},{"key":"1401_CR14","volume-title":"Proceedings of the Australian Telecommunications, Networks and Applications Conference (ATNAC)","author":"L Jia","year":"2003","unstructured":"Jia L, Bagirov AM, Ouveysi I, Rubinov AM: Optimization based clustering algorithms in multicast group hierarchies. In Proceedings of the Australian Telecommunications, Networks and Applications Conference (ATNAC). Melbourne Australia; 2003. (published on CD, ISNB 0\u2013646\u201342229\u20134). (published on CD, ISNB 0-646-42229-4)."},{"key":"1401_CR15","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1089\/106652700750050961","volume":"7","author":"N Friedman","year":"2000","unstructured":"Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7: 601\u2013620. 10.1089\/106652700750050961","journal-title":"J Comput Biol"},{"key":"1401_CR16","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1073\/pnas.95.1.334","volume":"95","author":"X Wen","year":"1998","unstructured":"Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R: Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA 1998, 95: 334\u2013339. 10.1073\/pnas.95.1.334","journal-title":"Proc Natl Acad Sci USA"},{"key":"1401_CR17","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1089\/10665270252935485","volume":"9","author":"V Filkov","year":"2002","unstructured":"Filkov V, Skiena S, Zhi J: Analysis techniques for microarray time-series data. J Comput Biol 2002, 9: 317\u2013330. 10.1089\/10665270252935485","journal-title":"J Comput Biol"},{"key":"1401_CR18","doi-asserted-by":"publisher","first-page":"1069","DOI":"10.1093\/bioinformatics\/bti095","volume":"21","author":"R Balasubramaniyan","year":"2005","unstructured":"Balasubramaniyan R, Hullermeier E, Weskamp N, Kamper J: Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 2005, 21: 1069\u20131077. 10.1093\/bioinformatics\/bti095","journal-title":"Bioinformatics"},{"key":"1401_CR19","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-1904-8","volume-title":"Principal Component Analysis","author":"IT Jolliffe","year":"1986","unstructured":"Jolliffe IT: Principal Component Analysis. New York: Springer-Verlag; 1986."},{"key":"1401_CR20","doi-asserted-by":"publisher","first-page":"R51","DOI":"10.1186\/gb-2004-5-7-r51","volume":"5","author":"L Cai","year":"2004","unstructured":"Cai L, Huang H, Blackshaw S, Liu JS, Cepko C, Wong WH: Cluster analysis of SAGE data using a Poisson approach. Genome Biology 2004, 5: R51. 10.1186\/gb-2004-5-7-r51","journal-title":"Genome Biology"},{"key":"1401_CR21","doi-asserted-by":"publisher","first-page":"e247","DOI":"10.1371\/journal.pbio.0020247","volume":"2","author":"S Blackshaw","year":"2004","unstructured":"Blackshaw S, Harpavat S, Trimarchi J, Cai L, Huang H, Kuo WP, Weber G, Lee K, Fraioli RE, Cho S-H, Yung R, Asch E, Ohno-Machado L, Wong WH, Cepko CL: Genomic analysis of mouse retinal development. PLoS Biology 2004, 2: e247. 10.1371\/journal.pbio.0020247","journal-title":"PLoS Biology"},{"key":"1401_CR22","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1126\/science.282.5389.699","volume":"282","author":"S Chu","year":"1998","unstructured":"Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I: The transcriptional program of sporulation in budding yeast. Science 1998, 282: 699\u2013705. 10.1126\/science.282.5389.699","journal-title":"Science"},{"key":"1401_CR23","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1007\/s11103-005-4209-4","volume":"60","author":"K Jiang","year":"2006","unstructured":"Jiang K, Zhang S, Lee S, Tsai G, Kim K, Huang H, Chilcott C, Zhu T, Feldman LJ: Transcription profile analysis identify genes and pathways central to root cap functions in maize. Plant Molecular Biology 2006, 60: 343\u2013363. 10.1007\/s11103-005-4209-4","journal-title":"Plant Molecular Biology"},{"issue":"2","key":"1401_CR24","doi-asserted-by":"publisher","first-page":"research0003","DOI":"10.1186\/gb-2000-1-2-research0003","volume":"1","author":"T Hastie","year":"2000","unstructured":"Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown P: 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 2000, 1(2):research0003. 10.1186\/gb-2000-1-2-research0003","journal-title":"Genome Biology"},{"key":"1401_CR25","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","volume":"63","author":"R Tibshirani","year":"2001","unstructured":"Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a data set via the gap statistic. J R Statist Soc B 2001, 63: 411\u2013423. 10.1111\/1467-9868.00293","journal-title":"J R Statist Soc B"},{"key":"1401_CR26","first-page":"193","volume-title":"J Classifi","author":"L Hubert","year":"1995","unstructured":"Hubert L, Arabie P: Comparing partitions. J Classifi 1995, 193\u2013218."},{"key":"1401_CR27","doi-asserted-by":"publisher","first-page":"953","DOI":"10.1093\/bioinformatics\/16.11.953","volume":"16","author":"MZ Man","year":"2000","unstructured":"Man MZ, Wang X, Wang Y: POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 2000, 16: 953\u2013959. 10.1093\/bioinformatics\/16.11.953","journal-title":"Bioinformatics"},{"key":"1401_CR28","first-page":"452","volume":"5","author":"S Raychaudhuri","year":"2000","unstructured":"Raychaudhuri S, Stuart JM, Altman RB: Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 2000, 5: 452\u2013463.","journal-title":"Pac Symp Biocomput"},{"key":"1401_CR29","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1093\/bioinformatics\/17.9.763","volume":"17","author":"KY Yeung","year":"2001","unstructured":"Yeung KY, Ruzzo WL: Principal component analysis for clustering gene expression data. Bioinformatics 2001, 17: 763\u2013774. 10.1093\/bioinformatics\/17.9.763","journal-title":"Bioinformatics"},{"key":"1401_CR30","doi-asserted-by":"publisher","first-page":"10101","DOI":"10.1073\/pnas.97.18.10101","volume":"97","author":"O Alter","year":"2000","unstructured":"Alter O, Brown PO, Bostein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 2000, 97: 10101\u201310106. 10.1073\/pnas.97.18.10101","journal-title":"Proc Natl Acad Sci USA"},{"key":"1401_CR31","doi-asserted-by":"publisher","first-page":"8409","DOI":"10.1073\/pnas.150242097","volume":"97","author":"NS Holter","year":"2000","unstructured":"Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoroff NV: Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci USA 2000, 97: 8409\u20138414. 10.1073\/pnas.150242097","journal-title":"Proc Natl Acad Sci USA"},{"key":"1401_CR32","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1093\/bioinformatics\/btg051","volume":"19","author":"S Bicciato","year":"2003","unstructured":"Bicciato S, Luchini A, Di Bello C: PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics 2003, 19: 571\u2013578. 10.1093\/bioinformatics\/btg051","journal-title":"Bioinformatics"},{"key":"1401_CR33","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.1101\/gr.225302","volume":"12","author":"J Misra","year":"2002","unstructured":"Misra J, Schmitt W, Hwang D, Hsiao L-L, Gullans S, Stephanopoulos G: Interactive exploration of microarray gene expression patterns in a reduced dimensional space. Genome Res 2002, 12: 1112\u20131120. 10.1101\/gr.225302","journal-title":"Genome Res"},{"key":"1401_CR34","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1093\/bioinformatics\/bti188","volume":"21","author":"D Komura","year":"2005","unstructured":"Komura D, Nakamura H, Tsutsumi S, Aburatani H, Ihara S: Multidimensional support vector machines for visualization of gene expression data. Bioinformatics 2005, 21: 439\u2013444. 10.1093\/bioinformatics\/bti188","journal-title":"Bioinformatics"},{"key":"1401_CR35","doi-asserted-by":"publisher","first-page":"267","DOI":"10.2307\/2347949","volume":"32","author":"W-C Chang","year":"1983","unstructured":"Chang W-C: On using principal components before separating a mixture of two multivariate normal distributions. Appl Statist 1983, 32: 267\u2013275. 10.2307\/2347949","journal-title":"Appl Statist"},{"key":"1401_CR36","doi-asserted-by":"publisher","first-page":"S105","DOI":"10.1093\/bioinformatics\/18.suppl_1.S105","volume":"18","author":"BP Durbin","year":"2002","unstructured":"Durbin BP, Hardin JS, Hawkins DM, Rocke DM: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 2002, 18: S105-S110.","journal-title":"Bioinformatics"},{"key":"1401_CR37","volume-title":"Heterogeneity of variance in gene expression microarray data","author":"DM Rocke","year":"2003","unstructured":"Rocke DM: Heterogeneity of variance in gene expression microarray data.University of California at Davis, Department of Applied Science and Division of Bio statistics; 2003. [http:\/\/www.cipic.ucdavis.edu\/~dmrocke\/papers\/empbayes2.pdf]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-8-29.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T10:06:49Z","timestamp":1630490809000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-8-29"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,1,27]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["1401"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-8-29","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,1,27]]},"assertion":[{"value":"1 September 2006","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2007","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2007","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"29"}}