{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:36:43Z","timestamp":1759333003327},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-10-53","type":"journal-article","created":{"date-parts":[[2009,2,7]],"date-time":"2009-02-07T19:13:04Z","timestamp":1234033984000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["Outcome prediction based on microarray analysis: a critical perspective on methods"],"prefix":"10.1186","volume":"10","author":[{"given":"Michalis","family":"Zervakis","sequence":"first","affiliation":[]},{"given":"Michalis E","family":"Blazadonakis","sequence":"additional","affiliation":[]},{"given":"Georgia","family":"Tsiliki","sequence":"additional","affiliation":[]},{"given":"Vasiliki","family":"Danilatou","sequence":"additional","affiliation":[]},{"given":"Manolis","family":"Tsiknakis","sequence":"additional","affiliation":[]},{"given":"Dimitris","family":"Kafetzopoulos","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2009,2,7]]},"reference":[{"key":"2783_CR1","first-page":"1","volume":"381","author":"H Seliger","year":"2007","unstructured":"Seliger H: Introduction: array technology \u2013 an overview. Methods Mol Biol 2007, 381: 1\u201336.","journal-title":"Methods Mol Biol"},{"key":"2783_CR2","doi-asserted-by":"publisher","first-page":"1599","DOI":"10.1038\/sj.bjc.6601326","volume":"89","author":"R Simon","year":"2003","unstructured":"Simon R: Diagnostic and Prognostic Prediction Using Gene Expression Profiles in High-Dimensional Microarray Data. British Journal of Cancer 2003, 89: 1599\u20131604.","journal-title":"British Journal of Cancer"},{"key":"2783_CR3","doi-asserted-by":"publisher","first-page":"673","DOI":"10.1038\/89044","volume":"7","author":"J Khan","year":"2001","unstructured":"Khan J, Wei JS, Ringn\u00e9r M, Saal LH, Ladanyi M, Westermann F, Bardhold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 2001, 7: 673\u2013679.","journal-title":"Nature Medicine"},{"key":"2783_CR4","doi-asserted-by":"publisher","first-page":"1165","DOI":"10.1126\/science.1125948","volume":"312","author":"WS Dalton","year":"2006","unstructured":"Dalton WS, Friend SH: Cancer Biomarkers An Invitation to the Table. Science 2006, 312: 1165\u20131168.","journal-title":"Science"},{"key":"2783_CR5","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1186\/1471-2105-7-543","volume":"7","author":"S Niijima","year":"2006","unstructured":"Niijima S, Kuhara S: Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE. BMC Bioinformatics 2006, 7: 543.","journal-title":"BMC Bioinformatics"},{"key":"2783_CR6","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1016\/j.artmed.2004.01.007","volume":"31","author":"I Inza","year":"2004","unstructured":"Inza I, Larranaga P, Blanco R, Cerrolaza AJ: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine 2004, 31: 91\u2013103.","journal-title":"Artificial Intelligence in Medicine"},{"key":"2783_CR7","first-page":"9","volume-title":"BMC Genomics","author":"M Pirooznia","year":"2008","unstructured":"Pirooznia M, Yang JY, Yang MQ, Deng Y: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 2008, 9. doi:10.1186\/1471\u20132164\u20139-S1-S13. doi:10.1186\/1471-2164-9-S1-S13."},{"key":"2783_CR8","doi-asserted-by":"publisher","first-page":"3741","DOI":"10.1093\/bioinformatics\/bti618","volume":"21","author":"F Li","year":"2005","unstructured":"Li F, Yang Y: Analysis of recursive gene selection approaches from microarray data. Bioinformatics 2005, 21: 3741\u20133747.","journal-title":"Bioinformatics"},{"key":"2783_CR9","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub TR, Slonim K, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lande ES: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999, 286: 531\u2013536.","journal-title":"Science"},{"key":"2783_CR10","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"36","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using Support vector machines. machine learning 2002, 36: 389\u2013422.","journal-title":"machine learning"},{"key":"2783_CR11","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1038\/nrg1749","volume":"7","author":"DB Allison","year":"2006","unstructured":"Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews 2006, 7: 55\u201365.","journal-title":"Nature Reviews"},{"key":"2783_CR12","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1038\/35076576","volume":"2","author":"J Quackenbush","year":"2001","unstructured":"Quackenbush J: Computational Analysis of Microarray data. Nature Reviews 2001, 2: 418\u2013427.","journal-title":"Nature Reviews"},{"key":"2783_CR13","first-page":"111","volume":"224","author":"GK Smyth","year":"2003","unstructured":"Smyth GK, Yang YH, Speed T: Statistical Issues in cDNA Microarray Data Analysis. Methods in Molecular Biology 2003, 224: 111\u2013136.","journal-title":"Methods in Molecular Biology"},{"key":"2783_CR14","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1038\/nrg863","volume":"3","author":"YH Yang","year":"2002","unstructured":"Yang YH, Speed T: Design Issues for cDNA Microarray Experiments. Nature Reviews 2002, 3: 579\u2013588.","journal-title":"Nature Reviews"},{"key":"2783_CR15","doi-asserted-by":"publisher","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","volume":"96","author":"U Alon","year":"1999","unstructured":"Alon U, Barkai N, Notterman D, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal cancer tissues proposed by oligonucleotide arrays. PNAS 1999, 96: 6745\u20136750.","journal-title":"PNAS"},{"issue":"1","key":"2783_CR16","doi-asserted-by":"crossref","first-page":"Article8","DOI":"10.2202\/1544-6115.1322","volume":"7","author":"W Jiang","year":"2008","unstructured":"Jiang W, Varma S, Simon R: Calculating Confidence Intervals for Prediction Error in Microarray Classification Using Resampling. Stat Appl Genet Mol Biol 2008, 7(1):Article8.","journal-title":"Stat Appl Genet Mol Biol"},{"key":"2783_CR17","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1186\/1471-2105-8-415","volume":"8","author":"M Gormley","year":"2007","unstructured":"Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A: Prediction Potential of Candidate Biomarker Sets Identified and Validated on Gene Expression Data from Multiple Data sets. BMC Bioinformatics 2007, 8: 415.","journal-title":"BMC Bioinformatics"},{"key":"2783_CR18","doi-asserted-by":"publisher","first-page":"488","DOI":"10.1016\/S0140-6736(05)17866-0","volume":"365","author":"S Michiels","year":"2005","unstructured":"Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 2005, 365: 488\u2013492.","journal-title":"Lancet"},{"key":"2783_CR19","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1186\/1471-2105-7-407","volume":"7","author":"SG Baker","year":"2006","unstructured":"Baker SG, Kramer BS: Identifying genes that contribute more to good classification in microarrays. BMC Bioinformatics 2006, 7: 407.","journal-title":"BMC Bioinformatics"},{"issue":"15","key":"2783_CR20","doi-asserted-by":"publisher","first-page":"5923","DOI":"10.1073\/pnas.0601231103","volume":"103","author":"L Ein-Dor","year":"2006","unstructured":"Ein-Dor L, Domany E: Thousands of Samples are Needed to Generate a Robust Gene List for Predicting Outcome in Cancer. PNAS 2006, 103(15):5923\u20135928.","journal-title":"PNAS"},{"key":"2783_CR21","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1093\/jnci\/djk018","volume":"99","author":"A Dupuy","year":"2007","unstructured":"Dupuy A, Simon R: Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting. J Natl Cancer Inst 2007, 99: 147\u2013157.","journal-title":"J Natl Cancer Inst"},{"key":"2783_CR22","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1093\/bioinformatics\/bth469","volume":"21","author":"L Ein-Dor","year":"2005","unstructured":"Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2005, 21: 171\u2013178.","journal-title":"Bioinformatics"},{"issue":"6871","key":"2783_CR23","doi-asserted-by":"publisher","first-page":"530","DOI":"10.1038\/415530a","volume":"415","author":"LJ Van't Veer","year":"2002","unstructured":"Van't Veer LJ, Dai H, Vijver MJ, He YD, Augustinus AM, Mao Mao, Peterse HL, Kooy Karin, Marton MJ, Witteven AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415(6871):530\u2013536.","journal-title":"Nature"},{"key":"2783_CR24","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1634\/theoncologist.12-3-301","volume":"12","author":"JP Ioannidis","year":"2007","unstructured":"Ioannidis JP: Is Molecular Profiling Ready for Use in Clinical Decision-making? The Oncologist 2007, 12: 301\u2013311.","journal-title":"The Oncologist"},{"key":"2783_CR25","volume-title":"Statistical Analysis with Missing Data","author":"A Little","year":"1987","unstructured":"Little A, Rubin D: Statistical Analysis with Missing Data. Wiley Series in Probability and Mathematical Statistics; 1987."},{"issue":"8","key":"2783_CR26","doi-asserted-by":"publisher","first-page":"894","DOI":"10.1016\/j.compbiomed.2008.05.005","volume":"38","author":"M Blazadonakis","year":"2008","unstructured":"Blazadonakis M, Zervakis M: Wrapper Filtering Criteria Via a Linear Neuron and Kernel Approaches. Comput Biol Med 2008, 38(8):894\u2013912.","journal-title":"Comput Biol Med"},{"issue":"1","key":"2783_CR27","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1093\/bioinformatics\/btg382","volume":"20","author":"J Goeman","year":"2003","unstructured":"Goeman J, Geer S, de Koort F, Van Houwelingen H: A global test for groups of genes: testing association with clinical outcome. Bioinformtics 2003, 20(1):93\u201399.","journal-title":"Bioinformtics"},{"key":"2783_CR28","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1093\/bioinformatics\/bti033","volume":"21","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005, 21: 631\u2013643.","journal-title":"Bioinformatics"},{"key":"2783_CR29","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1016\/j.ijmedinf.2005.05.002","volume":"74","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF: GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. International Journal of Medical Informatics 2005, 74: 491\u2013503.","journal-title":"International Journal of Medical Informatics"},{"issue":"10","key":"2783_CR30","doi-asserted-by":"publisher","first-page":"6562","DOI":"10.1073\/pnas.102102699","volume":"99","author":"C Ambroise","year":"2002","unstructured":"Ambroise C, McLachlan GL: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 2002, 99(10):6562\u20136566.","journal-title":"Proc Natl Acad Sci USA"},{"key":"2783_CR31","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1093\/jnci\/95.1.14","volume":"95","author":"R Simon","year":"2003","unstructured":"Simon R, Radmacher MD, Dobbin K, McShane LM: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl Cancer Institute 2003, 95: 14\u201318.","journal-title":"J. Natl Cancer Institute"},{"issue":"1","key":"2783_CR32","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1093\/nar\/gki144","volume":"33","author":"Y Tan","year":"2005","unstructured":"Tan Y, Shi L, Tong W, Wang C: Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data. Nucleic Acid Res 2005, 33(1):56\u201365.","journal-title":"Nucleic Acid Res"},{"key":"2783_CR33","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.1101\/gr.225302","volume":"12","author":"J Misra","year":"2002","unstructured":"Misra J, Schmitt W, Hwang D, Hsiao L, Gullans S, Stephanopoulos G, Stephanopoulos Gr: Interactive Exploration of Microarray Gene Expression Patterns in a Reduced Dimensional Space. Genome Research 2002, 12: 1112\u20131120.","journal-title":"Genome Research"},{"key":"2783_CR34","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/j.jchromb.2007.10.042","volume":"866","author":"S Smit","year":"2008","unstructured":"Smit S, Hoefsloot H, Smilde A: Statistical Data Processing in Clinical Proteomics. Journal of Chromatography B 2008, 866: 77\u201388.","journal-title":"Journal of Chromatography B"},{"issue":"9458","key":"2783_CR35","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1016\/S0140-6736(05)70249-X","volume":"365","author":"J Ioannidis","year":"2005","unstructured":"Ioannidis J: Microarrays and molecular research: noise discovery? Lancent 2005, 365(9458):354\u2013355.","journal-title":"Lancent"},{"key":"2783_CR36","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1186\/1471-2105-7-91","volume":"7","author":"S Varma","year":"2006","unstructured":"Varma S, Simon R: Bias in Error Estimation when using Cross-Validation for Model Selection. BMC Bioinformatics 2006, 7: 91.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"2783_CR37","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.cmpb.2008.02.009","volume":"91","author":"M Blazadonakis","year":"2008","unstructured":"Blazadonakis M, Zervakis M: The Linear Neuron as Marker Selector and Clinical Predictor. Comput Methods Programs Biomed 2008, 91(1):22\u201335.","journal-title":"Comput Methods Programs Biomed"},{"key":"2783_CR38","volume-title":"The Nature of Statistical Learning Theory","author":"NV Vapnik","year":"1999","unstructured":"Vapnik NV: The Nature of Statistical Learning Theory. Springer-Verlag New York; 1999."},{"key":"2783_CR39","doi-asserted-by":"publisher","DOI":"10.1142\/9789812776655","volume-title":"Least Square Support Vector Machines","author":"JA Suykens","year":"2002","unstructured":"Suykens JA, Gestel TV, De Brabanter J, De Moor B, Vandewalle J: Least Square Support Vector Machines. World Scientific Publishing; 2002."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-10-53.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:40:24Z","timestamp":1630446024000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-10-53"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,2,7]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["2783"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-10-53","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,2,7]]},"assertion":[{"value":"4 August 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2009","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2009","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"53"}}