{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T16:27:08Z","timestamp":1762100828443},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"S6","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-s6-s8","type":"journal-article","created":{"date-parts":[[2008,5,28]],"date-time":"2008-05-28T18:15:46Z","timestamp":1211998546000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Dimension reduction with redundant gene elimination for tumor classification"],"prefix":"10.1186","volume":"9","author":[{"given":"Xue-Qiang","family":"Zeng","sequence":"first","affiliation":[]},{"given":"Guo-Zheng","family":"Li","sequence":"additional","affiliation":[]},{"given":"Jack Y","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Mary Qu","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Geng-Feng","family":"Wu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,5,28]]},"reference":[{"issue":"5439","key":"2619_CR1","first-page":"531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Bioinformatics & Computational Biology. 1999, 286 (5439): 531-537.","journal-title":"Bioinformatics & Computational Biology"},{"key":"2619_CR2","first-page":"6745","volume-title":"Proceedings of the National Academy of Sciences of the United States of America","author":"U Alon","year":"1999","unstructured":"Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America. 1999, 6745-6750. 10.1073\/pnas.96.12.6745."},{"issue":"5","key":"2619_CR3","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1093\/bioinformatics\/btg062","volume":"19","author":"A Antoniadis","year":"2003","unstructured":"Antoniadis A, Lambert-Lacroix S, Leblanc F: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics. 2003, 19 (5): 563-570. 10.1093\/bioinformatics\/btg062.","journal-title":"Bioinformatics"},{"issue":"3","key":"2619_CR4","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1016\/j.csda.2003.08.001","volume":"46","author":"DV Nguyen","year":"2004","unstructured":"Nguyen DV, David DM, Rocke M: On partial least squares dimension reduction for microarray-based classification: a simulation study. Computational Statistics & Data Analysis. 2004, 46 (3): 407-425. 10.1016\/j.csda.2003.08.001.","journal-title":"Computational Statistics & Data Analysis"},{"key":"2619_CR5","doi-asserted-by":"publisher","first-page":"Article 6","DOI":"10.2202\/1544-6115.1147","volume":"5","author":"JJ Dai","year":"2006","unstructured":"Dai JJ, Lieu L, Rocke D: Dimension reduction for classification with gene expression data. Statistical Applications in Genetics and Molecular Biology. 2006, 5: Article 6-10.2202\/1544-6115.1147.","journal-title":"Statistical Applications in Genetics and Molecular Biology"},{"key":"2619_CR6","first-page":"22","volume-title":"Proc. 10th ACM SIGKDD Conf. Knowledge Discovery and Data Mining","author":"L Yu","year":"2004","unstructured":"Yu L, Liu H: Redundancy Based Feature Selection for Microarray Data. Proc. 10th ACM SIGKDD Conf. Knowledge Discovery and Data Mining. 2004, 22-25."},{"issue":"Oct","key":"2619_CR7","first-page":"1205","volume":"5","author":"L Yu","year":"2004","unstructured":"Yu L, Liu H: Efficient Feature Selection Via Analysis of Relevance and Redundancy. Journal of Machine Learning Research. 2004, 5 (Oct): 1205-1224.","journal-title":"Journal of Machine Learning Research"},{"issue":"7\u20138","key":"2619_CR8","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1162\/153244303322753616","volume":"3","author":"I Guyon","year":"2003","unstructured":"Guyon I, Elisseefi A: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research. 2003, 3 (7\u20138): 1157-1182. 10.1162\/153244303322753616.","journal-title":"Journal of Machine Learning Research"},{"key":"2619_CR9","doi-asserted-by":"publisher","first-page":"1289","DOI":"10.1162\/153244303322753670","volume":"3","author":"G Forman","year":"2003","unstructured":"Forman G: An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research. 2003, 3: 1289-1305. 10.1162\/153244303322753670.","journal-title":"Journal of Machine Learning Research"},{"issue":"6","key":"2619_CR10","doi-asserted-by":"publisher","first-page":"1437","DOI":"10.1109\/TKDE.2003.1245283","volume":"15","author":"MA Hall","year":"2003","unstructured":"Hall MA, Holmes G: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering. 2003, 15 (6): 1437-1447. 10.1109\/TKDE.2003.1245283.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"2619_CR11","volume-title":"Principal Component Analysis","author":"IT Jolliffe","year":"2002","unstructured":"Jolliffe IT: Principal Component Analysis. 2002, Springer Series in Statistics, Springer, second","edition":"second"},{"issue":"3","key":"2619_CR12","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1137\/0905052","volume":"5","author":"S Wold","year":"1984","unstructured":"Wold S, Ruhe A, Wold H, Dunn W: Collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal of Scientific and Statistical Computations. 1984, 5 (3): 735-743. 10.1137\/0905052.","journal-title":"SIAM Journal of Scientific and Statistical Computations"},{"key":"2619_CR13","volume-title":"Briefings in Bioinformatics","author":"AL Boulesteix","year":"2006","unstructured":"Boulesteix AL, Strimmer K: Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data. Briefings in Bioinformatics. 2006"},{"issue":"9","key":"2619_CR14","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1093\/bioinformatics\/18.9.1216","volume":"18","author":"DV Nguyen","year":"2002","unstructured":"Nguyen DV, Rocke DM: Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics. 2002, 18 (9): 1216-1226. 10.1093\/bioinformatics\/18.9.1216.","journal-title":"Bioinformatics"},{"key":"2619_CR15","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1093\/bioinformatics\/18.1.39","volume":"18","author":"DV Nguyen","year":"2002","unstructured":"Nguyen DV, Rocke DM: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics. 2002, 18: 39-50. 10.1093\/bioinformatics\/18.1.39.","journal-title":"Bioinformatics"},{"key":"2619_CR16","volume-title":"An Introduction to Support Vector Machines","author":"N Cristianini","year":"2000","unstructured":"Cristianini N, Shawe-Taylor J: An Introduction to Support Vector Machines. 2000, Cambridge: Cambridge University Press"},{"key":"2619_CR17","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"46","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S, Vapnik V: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning. 2002, 46: 389-422. 10.1023\/A:1012487302797.","journal-title":"Machine Learning"},{"issue":"5","key":"2619_CR18","doi-asserted-by":"publisher","first-page":"1630","DOI":"10.1021\/ci049869h","volume":"44","author":"Y Xue","year":"2004","unstructured":"Xue Y, Li ZR, Yap CW, Sun LZ, Chen X, Chen YZ: Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents. Journal of Chemical Information & Computer Science. 2004, 44 (5): 1630-1638. 10.1021\/ci049869h.","journal-title":"Journal of Chemical Information & Computer Science"},{"issue":"6","key":"2619_CR19","doi-asserted-by":"publisher","first-page":"2478","DOI":"10.1021\/ci060128l","volume":"46","author":"S Bhavani","year":"2006","unstructured":"Bhavani S, Nagargadde A, Thawani A, Sridhar V, Chandra N: Substructure-Based Support Vector Machine Classifiers for Prediction of Adverse Effects in Diverse Classes of Drugs. Journal of Chemical Information and Modeling. 2006, 46 (6): 2478-2486. 10.1021\/ci060128l.","journal-title":"Journal of Chemical Information and Modeling"},{"key":"2619_CR20","volume-title":"Statistical Learning Theory","author":"V Vapnik","year":"1998","unstructured":"Vapnik V: Statistical Learning Theory. 1998, New York: Wiley"},{"key":"2619_CR21","volume-title":"Kent Ridge Bio-medical Data Set Repository","author":"J Li","year":"2002","unstructured":"Li J, Liu H: Kent Ridge Bio-medical Data Set Repository. 2002, [http:\/\/www.cs.shu.edu.cn\/gzli\/data\/mirror-kentridge.html]"},{"key":"2619_CR22","doi-asserted-by":"publisher","first-page":"1895","DOI":"10.1162\/089976698300017197","volume":"10","author":"TG Dietterich","year":"1998","unstructured":"Dietterich TG: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation. 1998, 10: 1895-1923. 10.1162\/089976698300017197.","journal-title":"Neural Computation"},{"key":"2619_CR23","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1186\/1471-2105-6-68","volume":"6","author":"I Levner","year":"2005","unstructured":"Levner I: Feature Selection and Nearest Centroid Classification for Protein Mass Spectrometry. BMC Bioinformatics. 2005, 6: 68-10.1186\/1471-2105-6-68.","journal-title":"BMC Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-S6-S8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T08:19:24Z","timestamp":1630484364000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-S6-S8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,5]]},"references-count":23,"journal-issue":{"issue":"S6","published-print":{"date-parts":[[2008,5]]}},"alternative-id":["2619"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-s6-s8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,5]]},"assertion":[{"value":"28 May 2008","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S8"}}