{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,26]],"date-time":"2026-06-26T00:07:52Z","timestamp":1782432472305,"version":"3.54.5"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Gene selection is an important step when building predictors of disease state based on gene expression data. Gene selection generally improves performance and identifies a relevant subset of genes. Many univariate and multivariate gene selection approaches have been proposed. Frequently the claim is made that genes are co-regulated (due to pathway dependencies) and that multivariate approaches are therefore per definition more desirable than univariate selection approaches. Based on the published performances of all these approaches a fair comparison of the available results can not be made. This mainly stems from two factors. First, the results are often biased, since the validation set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Consequently no generally applicable conclusions can be drawn.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this study we adopted an unbiased protocol to perform a fair comparison of frequently used multivariate and univariate gene selection techniques, in combination with a r\u00e4nge of classifiers. Our conclusions are based on seven gene expression datasets, across several cancer types.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Our experiments illustrate that, contrary to several previous studies, in five of the seven datasets univariate selection approaches yield consistently better results than multivariate approaches. The simplest multivariate selection approach, the Top Scoring method, achieves the best results on the remaining two datasets. We conclude that the correlation structures, if present, are difficult to extract due to the small number of samples, and that consequently, overly-complex gene selection algorithms that attempt to extract these structures are prone to overtraining.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-235","type":"journal-article","created":{"date-parts":[[2006,5,2]],"date-time":"2006-05-02T20:20:23Z","timestamp":1146601223000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":92,"title":["A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets"],"prefix":"10.1186","volume":"7","author":[{"given":"Carmen","family":"Lai","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marcel JT","family":"Reinders","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Laura J","family":"van't Veer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lodewyk FA","family":"Wessels","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2006,5,2]]},"reference":[{"key":"974_CR1","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","volume":"97","author":"John Kohavi G Rand","year":"1997","unstructured":"Kohavi G Rand John: Wrappers for Feature Subset Selection. Artificial Intelligence 1997, 97: 273\u2013324.","journal-title":"Artificial Intelligence"},{"key":"974_CR2","volume-title":"Ninth International Workshop on Artificial Intelligence and Statistics","author":"Tssamardinos C land Aliferis","year":"2003","unstructured":"Tssamardinos C land Aliferis: Towards Principled Feature Selection: Relevancy, Filters and Wrappers. Ninth International Workshop on Artificial Intelligence and Statistics 2003."},{"key":"974_CR3","volume-title":"Bioinformatics","author":"L Ein-Dor","year":"2004","unstructured":"Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2004., (12):"},{"key":"974_CR4","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1145\/332306.332328","volume-title":"Proceedings of the fourth annual international Conference on Computational molecular biology","author":"A Ben-Dor","year":"2000","unstructured":"Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z: Tissue classification with gene expression profiles. In Proceedings of the fourth annual international Conference on Computational molecular biology. Tokyo, Japan: ACM Press; 2000:54\u201364."},{"issue":"8","key":"974_CR5","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.1142\/S0218001404003800","volume":"18","author":"R Blanco","year":"2004","unstructured":"Blanco R, Larranaga P, Inza I, Sierra B: Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence 2004, 18(8):1373\u20131390.","journal-title":"International Journal of Pattern Recognition and Artificial Intelligence"},{"key":"974_CR6","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1152\/physiolgenomics.2001.5.2.99","volume":"5","author":"M Chow","year":"2001","unstructured":"Chow M, Moler I EJand Mian: Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. Physiol Genomics 2001, 5: 99\u2013111.","journal-title":"Physiol Genomics"},{"issue":"5","key":"974_CR7","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1093\/bioinformatics\/bti033","volume":"21","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Aliferis C, Tsamardinos I, Hardin D, Levy S: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005, 21(5):631\u2013643.","journal-title":"Bioinformatics"},{"key":"974_CR8","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"T Golub","year":"1999","unstructured":"Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E: Molecular classification of cancer: dass discovery and class prediction by gene expression monitoring. Science 1999, 286: 531\u2013537.","journal-title":"Science"},{"key":"974_CR9","volume-title":"Pacific Symposium on Biocomputing","author":"J Jaeger","year":"2003","unstructured":"Jaeger J, Sengupta R, Ruzzo W: Improved Gene Selection For Classification Of Microarrays. Pacific Symposium on Biocomputing 2003."},{"issue":"4","key":"974_CR10","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1016\/S0165-1684(02)00474-7","volume":"83","author":"C Bhattacharyya","year":"2003","unstructured":"Bhattacharyya C, Grate LR, Rizki A, Radisky D, Molina FJ, Jordan MI, Bissell MJ, Mian IS: Simultaneous classification and relevant feature Identification in high-dimensional spaces: application to molecular profiling data. Signal Processing 2003, 83(4):729\u2013743.","journal-title":"Signal Processing"},{"key":"974_CR11","volume-title":"Proceedings of the First Asia-Pacific bioinformatics Conference","author":"S Cho","year":"2003","unstructured":"Cho S, Won H: Machine learning in DNA microarray analysis for cancer classification. Proceedings of the First Asia-Pacific bioinformatics Conference 2003."},{"key":"974_CR12","volume-title":"International Conference on Machine Learning","author":"E Xing","year":"2001","unstructured":"Xing E, Jordan M, Karp R: Feature selection for high-dimensional genomic microarray data. International Conference on Machine Learning 2001."},{"key":"974_CR13","volume-title":"Statistical analysis of gene expression microarray data","author":"S Dudoit","year":"2003","unstructured":"Dudoit S, Fridlyand J: Statistical analysis of gene expression microarray data. 2003. chap. 3 chap. 3"},{"key":"974_CR14","volume-title":"Pattern Classification","author":"RO Duda","year":"2001","unstructured":"Duda RO, Hart PE, Stork DG: Pattern Classification. second edition. New York: John Wiley & Sons, Inc.; 2001.","edition":"second"},{"key":"974_CR15","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1006\/mgme.2001.3193","volume":"73","author":"M Xiong","year":"2001","unstructured":"Xiong M, La W, Zhao J, Jin L, Boerwinkle E: Feature (Gene) Selection in Gene Expression-Based Tumor Classification. Molecular Genetics and Metabolism 2001, 73: 239\u2013247.","journal-title":"Molecular Genetics and Metabolism"},{"key":"974_CR16","doi-asserted-by":"publisher","first-page":"1119","DOI":"10.1016\/0167-8655(94)90127-9","volume":"15","author":"P Pudil","year":"1994","unstructured":"Pudil P, Novovicova J, Kittler J: Floating search methods in feature selection. PRL 1994, 15: 1119\u20131125.","journal-title":"PRL"},{"issue":"10","key":"974_CR17","doi-asserted-by":"publisher","first-page":"1444","DOI":"10.1016\/j.patrec.2004.11.017","volume":"26","author":"P Silva","year":"2005","unstructured":"Silva P, Hashimoto R, Kim S, Barrera J, Brandao L, Suh E, Dougherty E: Feature selection algorithms to find strong genes. Pattern Recognition Letters 2005, 26(10):1444\u20131453. [http:\/\/www.vision.ime.usp.br\/]","journal-title":"Pattern Recognition Letters"},{"issue":"11","key":"974_CR18","doi-asserted-by":"crossref","first-page":"1878","DOI":"10.1101\/gr.190001","volume":"11","author":"M Xiong","year":"2001","unstructured":"Xiong M, Fang X, Zhao J: Biomarker Identification by Feature Wrappers. Genome Research 2001, 11(11):1878\u20131887.","journal-title":"Genome Research"},{"issue":"12","key":"974_CR19","doi-asserted-by":"publisher","first-page":"1131","DOI":"10.1093\/bioinformatics\/17.12.1131","volume":"17","author":"L Li","year":"2001","unstructured":"Li L, Weinberg C, Darden T, Pedersen L: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA\/KNN method. Bioinformatics 2001, 17(12):1131\u201342.","journal-title":"Bioinformatics"},{"key":"974_CR20","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"46","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 2002, (46):389\u2013422.","journal-title":"Machine Learning"},{"key":"974_CR21","volume-title":"Genome biology","author":"T Bo","year":"2002","unstructured":"Bo T, Jonassen I: New feature subset selection procedures for classification of expression profiles. Genome biology 2002., 3:"},{"key":"974_CR22","volume-title":"Statistical Applications in Genetics and Molecular Biology","author":"D Geman","year":"2004","unstructured":"Geman D, d'Avignon C, Naiman D, Winslow R: Classifying Gene Expression Profiles from Pairwise mRNA Comparisons. Statistical Applications in Genetics and Molecular Biology 2004., 3: [http:\/\/www.bepress.com\/sagmb\/vol3\/iss1\/art19\/]"},{"issue":"20","key":"974_CR23","doi-asserted-by":"publisher","first-page":"3905","DOI":"10.1093\/bioinformatics\/bti647","volume":"21","author":"L Xu","year":"2005","unstructured":"Xu L, Tan A, Naiman D, Geman D, Winslow R: Robust prostate cancer marker genes emerge from direct Integration of inter-study microarray data. Bioinformatics 2005, 21(20):3905\u20133911.","journal-title":"Bioinformatics"},{"key":"974_CR24","volume-title":"Workshop on Algorithms in Bioinformatics","author":"L Grate","year":"2002","unstructured":"Grate L, Bhattacharyya C, Jordan M, Mian I: Simultaneous classification and relevant feature Identification in high-dimensional spaces. Workshop on Algorithms in Bioinformatics 2002."},{"issue":"10","key":"974_CR25","doi-asserted-by":"publisher","first-page":"6562","DOI":"10.1073\/pnas.102102699","volume":"99","author":"C Ambroise","year":"2002","unstructured":"Ambroise C, McLachlan G: Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Accademy of Siences of the United States of America 2002, 99(10):6562\u20136566.","journal-title":"Proceedings of the National Accademy of Siences of the United States of America"},{"key":"974_CR26","volume-title":"Gene Selection for Cancer Classification using Support Vector Machines","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S: Gene Selection for Cancer Classification using Support Vector Machines.2002. [Http:\/\/www.clopinet.com\/isabelle\/Papers\/RFE-erratum.html]"},{"issue":"6","key":"974_CR27","doi-asserted-by":"publisher","first-page":"673","DOI":"10.1038\/89044","volume":"7","author":"J Khan","year":"2001","unstructured":"Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C, Peterson C, Meltzer P: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 2001, 7(6):673\u201379.","journal-title":"Nature Medicine"},{"key":"974_CR28","volume-title":"Proceedings of the Computational Systems Bioinformatics","author":"C Ding","year":"2003","unstructured":"Ding C, Peng H: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Proceedings of the Computational Systems Bioinformatics 2003."},{"key":"974_CR29","volume-title":"Bioinformatics Advanced Online Pub","author":"L Wessels","year":"2005","unstructured":"Wessels L, Reinders M, Hart A, Veenman C, Dai H, He Y, van 't Veer L: A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics Advanced Online Pub 2005."},{"key":"974_CR30","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1016\/j.ijmedinf.2005.05.002","volume":"74","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Tsamardinos Y land Dosbayev, Aliferis C: GEMS: A System for automated cancer diagnosis and biomarker discovery from microarray gene expression data. International Journal of Medical Informatics 2005, 74: 491\u2013503.","journal-title":"International Journal of Medical Informatics"},{"issue":"12","key":"974_CR31","doi-asserted-by":"publisher","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","volume":"96","author":"U Alon","year":"1999","unstructured":"Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Accademy of Siences of the United States of America 1999, 96(12):6745\u20136750.","journal-title":"Proceedings of the National Accademy of Siences of the United States of America"},{"issue":"4","key":"974_CR32","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1093\/bioinformatics\/bti032","volume":"21","author":"Z Guan","year":"2005","unstructured":"Guan Z, Zhao H: A semiparametric approach for marker gene selection based on gene expression data. Bioinformatics 2005, 21(4):529\u2013536.","journal-title":"Bioinformatics"},{"issue":"4","key":"974_CR33","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1093\/bioinformatics\/bti189","volume":"21","author":"O Abul","year":"2005","unstructured":"Abul O, Alhajj R, Polat F, Barker K: Finding differentially expressed genes for pattern generation. Bioinformatics 2005, 21(4):445\u2013450.","journal-title":"Bioinformatics"},{"key":"974_CR34","volume-title":"PhD thesis","author":"M Skurichina","year":"2001","unstructured":"Skurichina M: Stabilizing weak classifiers. PhD thesis. Delft, Technical University; 2001."},{"key":"974_CR35","doi-asserted-by":"publisher","first-page":"488","DOI":"10.1016\/S0140-6736(05)17866-0","volume":"365","author":"S Michiels","year":"2005","unstructured":"Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet 2005, 365: 488\u201392.","journal-title":"The Lancet"},{"key":"974_CR36","doi-asserted-by":"publisher","first-page":"530","DOI":"10.1038\/415530a","volume":"415","author":"L van 't Veer","year":"2002","unstructured":"van 't Veer L, Dai H, van de Vijver M, Yudong DH, Hart A, Mao M, Peterse H, van der Kooy K, Marton M, Witteveen A, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend S: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415: 530\u2013536.","journal-title":"Nature"},{"key":"974_CR37","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","volume":"7","author":"R Fisher","year":"1936","unstructured":"Fisher R: The use of multiple measurements in taxonomic problems. Ann Eugenics 1936, 7: 179\u2013188.","journal-title":"Ann Eugenics"},{"key":"974_CR38","volume-title":"Proceedings of the European Conference on Machine Learning","author":"R Kohavi","year":"1995","unstructured":"Kohavi R: The Power of Decision Tables. Proceedings of the European Conference on Machine Learning 1995."},{"key":"974_CR39","volume-title":"PR-Tools 4.0, a Matlab toolbox for pattern recognition","author":"RPW Duin","year":"2004","unstructured":"Duin RPW, Juszczak P, de Ridder D, Paclik P, Pekalska E, Tax DMJ: PR-Tools 4.0, a Matlab toolbox for pattern recognition.Tech, rep., IGT Group, TU Delft, The Netherlands; 2004. [http:\/\/www.prtools.org]"},{"key":"974_CR40","volume-title":"PRExp 2.0, a Matlab toolbox for evaluation of pattern recognition experiment","author":"P Paclik","year":"2005","unstructured":"Paclik P, Landgrebe TCW, Duin RPW: PRExp 2.0, a Matlab toolbox for evaluation of pattern recognition experiment. Tech, rep., IGT Group, TU Delft, The Netherlands; 2005."},{"key":"974_CR41","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/415436a","volume":"415","author":"S Pomeroy","year":"2002","unstructured":"Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau AllenJC, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T: Prediction of central nervous System embryonal tumour outcome based on gene expression. Nature 2002, 415: 436\u2013442.","journal-title":"Nature"},{"key":"974_CR42","first-page":"203","volume":"1","author":"D Singh","year":"2002","unstructured":"Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D'Amico A, Richie J, Lander E, Loda M, Kantoff P, Golub T, Seilers W: Gene expression correlates of clinical prostate cancer behavior. Cancer Gell 2002, 1: 203\u2013209.","journal-title":"Cancer Gell"},{"issue":"25","key":"974_CR43","doi-asserted-by":"publisher","first-page":"1999","DOI":"10.1056\/NEJMoa021967","volume":"347","author":"M van de Vijver","year":"2002","unstructured":"van de Vijver M, He Y, van t Veer L, Dai H, Hart A, Voskuil D, Schreiber G, Peterse J, Roberts C, Marton M, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R: A Gene-Expression Signature \u00e4s a Predictor of Survival in Breast Cancer. The New England Journal of Medicine 2002, 347(25):1999\u20132009.","journal-title":"The New England Journal of Medicine"},{"key":"974_CR44","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1038\/35000501","volume":"403","author":"A Alizadeh","year":"2000","unstructured":"Alizadeh A, Eisen M, Davis R, Chi Mea: Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling. Nature 2000, 403: 503\u2013511.","journal-title":"Nature"},{"key":"974_CR45","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1038\/ng1502","volume":"37","author":"Wessels Roepman L Fand","year":"2005","unstructured":"Roepman L Fand Wessels, Kettelarij N, Kemmeren P, Miles A, Lijnzaad M Fand Tilanus, Koole R, Hordijk G, Van der Vliet P, Reinders M, Slootweg P, Holstege F: An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nature Genetics 2005, 37: 182\u2013186.","journal-title":"Nature Genetics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-235.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T05:31:56Z","timestamp":1630560716000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-235"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,5,2]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["974"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-235","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,5,2]]},"assertion":[{"value":"16 September 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"235"}}