{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T08:35:41Z","timestamp":1774773341559,"version":"3.50.1"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method.<\/jats:p><jats:p>Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of \u223c15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.<\/jats:p><jats:p>Contact: \u00a0yvan.saeys@psb.ugent.be<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp630","type":"journal-article","created":{"date-parts":[[2009,11,27]],"date-time":"2009-11-27T23:31:24Z","timestamp":1259364684000},"page":"392-398","source":"Crossref","is-referenced-by-count":482,"title":["Robust biomarker identification for cancer diagnosis with ensemble feature selection methods"],"prefix":"10.1093","volume":"26","author":[{"given":"Thomas","family":"Abeel","sequence":"first","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"}]},{"given":"Thibault","family":"Helleputte","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"}]},{"given":"Yves","family":"Van de Peer","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"}]},{"given":"Pierre","family":"Dupont","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"}]},{"given":"Yvan","family":"Saeys","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, 2 Department of Molecular Genetics, Ghent University, Gent, 3 Department of Computing Science and Engineering INGI and 4 Machine Learning Group, Universit\u00e9 catholique de Louvain, Louvain, Belgium"}]}],"member":"286","published-online":{"date-parts":[[2009,11,25]]},"reference":[{"key":"2023012511002179400_B1","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/35000501","article-title":"Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling","volume":"403","author":"Alizadeh","year":"2000","journal-title":"Nature"},{"key":"2023012511002179400_B2","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511002179400_B3","doi-asserted-by":"crossref","first-page":"6562","DOI":"10.1073\/pnas.102102699","article-title":"Selection bias in gene extraction on the basis of microarray gene-expression data","volume":"99","author":"Ambroise","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511002179400_B4","doi-asserted-by":"crossref","first-page":"e1000173","DOI":"10.1371\/journal.pcbi.1000173","article-title":"Support vector machines and kernels for computational biology","volume":"4","author":"Ben-Hur","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012511002179400_B5","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1145\/130385.130401","article-title":"A training algorithm for optimal margin classifiers","volume-title":"Proceedings of fifth ACM workshop on computational learning theory (COLT)","author":"Boser","year":"1992"},{"key":"2023012511002179400_B6","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/bioinformatics\/btg419","article-title":"Is cross-validation valid for small-sample microarray classification?","volume":"20","author":"Braga-Neto","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012511002179400_B7","doi-asserted-by":"crossref","first-page":"3583","DOI":"10.1093\/bioinformatics\/bth447","article-title":"Bagboosting for tumor classification with gene expression data","volume":"20","author":"Dettling","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012511002179400_B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/3-540-45014-9_1","article-title":"Ensemble methods in machine learning","volume-title":"Proceedings of the 1st International Workshop on Multiple Classifier Systems","author":"Dietterich","year":"2000"},{"key":"2023012511002179400_B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/aos\/1176344552","article-title":"Bootstrap methods: another look at the jackknife","volume":"7","author":"Efron","year":"1979","journal-title":"Ann. Stat."},{"key":"2023012511002179400_B10","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023012511002179400_B11","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"2023012511002179400_B12","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach. Learn."},{"key":"2023012511002179400_B13","first-page":"533","article-title":"Feature selection by transfer learning with linear regularized models","volume":"5781","author":"Helleputte","year":"2009","journal-title":"Lect. Notes Artif. Intell."},{"key":"2023012511002179400_B14","doi-asserted-by":"crossref","DOI":"10.1145\/1553374.1553427","article-title":"Partially supervised feature selection with regularized linear models","volume-title":"26th International Conference on Machine Learning (ICML).","author":"Helleputte","year":"2009"},{"key":"2023012511002179400_B15","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s10115-006-0040-8","article-title":"Stability of feature selection algorithms: a study on high-dimensional spaces","volume":"12","author":"Kalousis","year":"2007","journal-title":"Knowl. Inf. Syst."},{"key":"2023012511002179400_B16","doi-asserted-by":"crossref","first-page":"299","DOI":"10.7551\/mitpress\/4057.003.0019","article-title":"Gene expression analysis: joint feature selection and classifier design","volume-title":"Kernel Methods in Computational Biology.","author":"Krishnapuram","year":"2004"},{"key":"2023012511002179400_B17","first-page":"309","article-title":"A stability index for feature selection","volume-title":"Proceedings of the 25th International Multi-Conference on Artificial Intelligence and Applications","author":"Kuncheva","year":"2007"},{"key":"2023012511002179400_B18","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1023\/A:1023937123600","article-title":"Boosting and microarray data","volume":"52","author":"Long","year":"2003","journal-title":"Machine Learning"},{"key":"2023012511002179400_B19","first-page":"43","article-title":"Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions","volume-title":"Proceedings of the Third International Conference on Knowledge Discovery and Data Mining.","author":"Provost","year":"1997"},{"key":"2023012511002179400_B20","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012511002179400_B21","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1007\/978-3-540-87481-2_21","article-title":"Robust feature selection using ensemble feature selection techniques","volume-title":"Proceedings of the 25th European Conference on Machine Learning and Knowledge Discovery in Databases, Part II","author":"Saeys","year":"2008"},{"key":"2023012511002179400_B22","volume-title":"Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond","author":"Schoelkopf","year":"2002"},{"key":"2023012511002179400_B23","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","article-title":"Gene expression correlates of clinical prostate cancer behavior","volume":"1","author":"Singh","year":"2002","journal-title":"Cancer cell"},{"key":"2023012511002179400_B24","volume-title":"Exploratory Data Analysis.","author":"Tukey","year":"1977"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/3\/392\/48860474\/bioinformatics_26_3_392.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/3\/392\/48860474\/bioinformatics_26_3_392.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,17]],"date-time":"2024-03-17T21:22:58Z","timestamp":1710710578000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/3\/392\/213807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,11,25]]},"references-count":24,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2010,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp630","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,2,1]]},"published":{"date-parts":[[2009,11,25]]}}}