{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T13:56:56Z","timestamp":1765807016201},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-390","type":"journal-article","created":{"date-parts":[[2011,12,3]],"date-time":"2011-12-03T18:18:56Z","timestamp":1322936336000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["Classification of microarrays; synergistic effects between normalization, gene selection and machine learning"],"prefix":"10.1186","volume":"12","author":[{"given":"Jenny","family":"\u00d6nskog","sequence":"first","affiliation":[]},{"given":"Eva","family":"Freyhult","sequence":"additional","affiliation":[]},{"given":"Mattias","family":"Landfors","sequence":"additional","affiliation":[]},{"given":"Patrik","family":"Ryd\u00e9n","sequence":"additional","affiliation":[]},{"given":"Torgeir R","family":"Hvidsten","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,10,7]]},"reference":[{"key":"4948_CR1","volume-title":"Batch Effects and Noise in Microarray Experiments: Sources and Solutions","author":"J Fahl\u00e9n","year":"2009","unstructured":"Fahl\u00e9n J, Landfors M, Freyhult E, Trygg J, Hvidsten TR, Ryd\u00e9n P: Bioinformatic strategies for cDNA-microarray data processing. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. Edited by: Scherer A. 2009, John Wiley & Sons"},{"issue":"20","key":"4948_CR2","doi-asserted-by":"publisher","first-page":"2700","DOI":"10.1093\/bioinformatics\/btm412","volume":"23","author":"ME Ritchie","year":"2007","unstructured":"Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, Smyth GK: A comparison of background correction methods for two-colour microarrays. Bioinformatics. 2007, 23 (20): 2700-2707. 10.1093\/bioinformatics\/btm412.","journal-title":"Bioinformatics"},{"issue":"Suppl","key":"4948_CR3","doi-asserted-by":"publisher","first-page":"496","DOI":"10.1038\/ng1032","volume":"32","author":"J Quackenbush","year":"2002","unstructured":"Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 (Suppl): 496-501.","journal-title":"Nat Genet"},{"key":"4948_CR4","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1186\/1471-2105-7-300","volume":"7","author":"P Ryden","year":"2006","unstructured":"Ryden P, Andersson H, Landfors M, Naslund L, Hartmanova B, Noppa L, Sjostedt A: Evaluation of microarray data normalization procedures using spike-in experiments. BMC Bioinformatics. 2006, 7: 300-10.1186\/1471-2105-7-300.","journal-title":"BMC Bioinformatics"},{"issue":"18","key":"4948_CR5","doi-asserted-by":"publisher","first-page":"5471","DOI":"10.1093\/nar\/gkh866","volume":"32","author":"LX Qin","year":"2004","unstructured":"Qin LX, Kerr KF: Empirical evaluation of data transformations and ranking statistics for microarray analysis. Nucleic Acids Res. 2004, 32 (18): 5471-5479. 10.1093\/nar\/gkh866.","journal-title":"Nucleic Acids Res"},{"key":"4948_CR6","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1186\/1471-2105-7-134","volume":"7","author":"SY Kim","year":"2006","unstructured":"Kim SY, Lee JW, Bae JS: Effect of data normalization on fuzzy clustering of DNA microarray data. BMC Bioinformatics. 2006, 7: 134-10.1186\/1471-2105-7-134.","journal-title":"BMC Bioinformatics"},{"key":"4948_CR7","doi-asserted-by":"crossref","unstructured":"Freyhult E, Landfors M, Onskog J, Hvidsten TR, Ryden P: Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering. BMC Bioinformatics. 11: 503.","DOI":"10.1186\/1471-2105-11-503"},{"issue":"19","key":"4948_CR8","doi-asserted-by":"publisher","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","volume":"23","author":"Y Saeys","year":"2007","unstructured":"Saeys Y, Inza I, Larranaga P: A review of feature selection techniques in bioinformatics. Bioinformatics. 2007, 23 (19): 2507-2517. 10.1093\/bioinformatics\/btm344.","journal-title":"Bioinformatics"},{"key":"4948_CR9","doi-asserted-by":"crossref","unstructured":"Duval B, Hao JK: Advances in metaheuristics for gene selection and classification of microarray data. Brief Bioinform. 11 (1): 127-141.","DOI":"10.1093\/bib\/bbp035"},{"issue":"1","key":"4948_CR10","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1093\/bib\/bbk007","volume":"7","author":"P Larranaga","year":"2006","unstructured":"Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A: Machine learning in bioinformatics. Brief Bioinform. 2006, 7 (1): 86-112. 10.1093\/bib\/bbk007.","journal-title":"Brief Bioinform"},{"issue":"Suppl 1","key":"4948_CR11","doi-asserted-by":"publisher","first-page":"S13","DOI":"10.1186\/1471-2164-9-S1-S13","volume":"9","author":"M Pirooznia","year":"2008","unstructured":"Pirooznia M, Yang JY, Yang MQ, Deng Y: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2008, 9 (Suppl 1): S13-10.1186\/1471-2164-9-S1-S13.","journal-title":"BMC Genomics"},{"issue":"8","key":"4948_CR12","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1093\/hmg\/ddg093","volume":"12","author":"C Romualdi","year":"2003","unstructured":"Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G: Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet. 2003, 12 (8): 823-836. 10.1093\/hmg\/ddg093.","journal-title":"Hum Mol Genet"},{"issue":"4","key":"4948_CR13","doi-asserted-by":"publisher","first-page":"869","DOI":"10.1016\/j.csda.2004.03.017","volume":"48","author":"JW Lee","year":"2005","unstructured":"Lee JW, Lee JB, Park M, Song SH: An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis. 2005, 48 (4): 869-885. 10.1016\/j.csda.2004.03.017.","journal-title":"Computational Statistics & Data Analysis"},{"issue":"15","key":"4948_CR14","doi-asserted-by":"publisher","first-page":"2429","DOI":"10.1093\/bioinformatics\/bth267","volume":"20","author":"T Li","year":"2004","unstructured":"Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics. 2004, 20 (15): 2429-2437. 10.1093\/bioinformatics\/bth267.","journal-title":"Bioinformatics"},{"issue":"5","key":"4948_CR15","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1093\/bioinformatics\/bti033","volume":"21","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005, 21 (5): 631-643. 10.1093\/bioinformatics\/bti033.","journal-title":"Bioinformatics"},{"issue":"14","key":"4948_CR16","doi-asserted-by":"publisher","first-page":"1960","DOI":"10.1016\/j.patrec.2008.06.018","volume":"29","author":"A Isaksson","year":"2008","unstructured":"Isaksson A, Wallman M, Goransson H, Gustafsson M: Cross-validation and bootstrapping are unreliable in small sample classification. Pattern Recognition Letters. 2008, 29 (14): 1960-1965. 10.1016\/j.patrec.2008.06.018.","journal-title":"Pattern Recognition Letters"},{"issue":"22","key":"4948_CR17","doi-asserted-by":"publisher","first-page":"8859","DOI":"10.1073\/pnas.0903931106","volume":"106","author":"J Jin","year":"2009","unstructured":"Jin J: Impossibility of successful classification when useful features are rare and weak. Proc Natl Acad Sci USA. 2009, 106 (22): 8859-8864. 10.1073\/pnas.0903931106.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"6981","key":"4948_CR18","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1038\/nature02341","volume":"428","author":"T Poggio","year":"2004","unstructured":"Poggio T, Rifkin R, Mukherjee S, Niyogi P: General conditions for predictivity in learning theory. Nature. 2004, 428 (6981): 419-422. 10.1038\/nature02341.","journal-title":"Nature"},{"key":"4948_CR19","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1186\/1471-2105-10-53","volume":"10","author":"M Zervakis","year":"2009","unstructured":"Zervakis M, Blazadonakis ME, Tsiliki G, Danilatou V, Tsiknakis M, Kafetzopoulos D: Outcome prediction based on microarray analysis: a critical perspective on methods. BMC Bioinformatics. 2009, 10: 53-10.1186\/1471-2105-10-53.","journal-title":"BMC Bioinformatics"},{"key":"4948_CR20","volume-title":"Engineering statistics","author":"DC Montgomery","year":"2007","unstructured":"Montgomery DC, Runger GC, Hubele NF: Engineering statistics. 2007, Hoboken, N.J.: Weily, 4","edition":"4"},{"key":"4948_CR21","volume-title":"Machine learning","author":"TM Mitchell","year":"1997","unstructured":"Mitchell TM: Machine learning. 1997, Singapore: McGraw-Hill, First","edition":"First"},{"issue":"10","key":"4948_CR22","doi-asserted-by":"publisher","first-page":"6562","DOI":"10.1073\/pnas.102102699","volume":"99","author":"C Ambroise","year":"2002","unstructured":"Ambroise C, McLachlan GJ: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA. 2002, 99 (10): 6562-6566. 10.1073\/pnas.102102699.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"11","key":"4948_CR23","doi-asserted-by":"publisher","first-page":"1438","DOI":"10.1093\/bioinformatics\/18.11.1438","volume":"18","author":"K Dobbin","year":"2002","unstructured":"Dobbin K, Simon R: Comparison of microarray designs for class comparison and class discovery. Bioinformatics. 2002, 18 (11): 1438-1445. 10.1093\/bioinformatics\/18.11.1438.","journal-title":"Bioinformatics"},{"issue":"6769","key":"4948_CR24","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1038\/35000501","volume":"403","author":"AA Alizadeh","year":"2000","unstructured":"Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403 (6769): 503-511. 10.1038\/35000501.","journal-title":"Nature"},{"issue":"5","key":"4948_CR25","doi-asserted-by":"publisher","first-page":"R58","DOI":"10.1186\/bcr1608","volume":"8","author":"G Finak","year":"2006","unstructured":"Finak G, Sadekova S, Pepin F, Hallett M, Meterissian S, Halwani F, Khetani K, Souleimanova M, Zabolotny B, Omeroglu A: Gene expression signatures of morphologically normal breast tissue identify basal-like tumors. Breast Cancer Res. 2006, 8 (5): R58-10.1186\/bcr1608.","journal-title":"Breast Cancer Res"},{"key":"4948_CR26","doi-asserted-by":"crossref","unstructured":"Galland F, Lacroix L, Saulnier P, Dessen P, Meduri G, Bernier M, Gaillard S, Guibourdenche J, Fournier T, Evain-Brion D: Differential gene expression profiles of invasive and non-invasive non-functioning pituitary adenomas based on microarray analysis. Endocr Relat Cancer. 17 (2): 361-371.","DOI":"10.1677\/ERC-10-0018"},{"issue":"5","key":"4948_CR27","doi-asserted-by":"publisher","first-page":"R76","DOI":"10.1186\/gb-2007-8-5-r76","volume":"8","author":"JI Herschkowitz","year":"2007","unstructured":"Herschkowitz JI, Simin K, Weigman VJ, Mikaelian I, Usary J, Hu Z, Rasmussen KE, Jones LP, Assefnia S, Chandrasekharan S: Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol. 2007, 8 (5): R76-10.1186\/gb-2007-8-5-r76.","journal-title":"Genome Biol"},{"issue":"9411","key":"4948_CR28","doi-asserted-by":"publisher","first-page":"775","DOI":"10.1016\/S0140-6736(04)15693-6","volume":"363","author":"MH Jones","year":"2004","unstructured":"Jones MH, Virtanen C, Honjoh D, Miyoshi T, Satoh Y, Okumura S, Nakagawa K, Nomura H, Ishikawa Y: Two prognostically significant subtypes of high-grade lung neuroendocrine tumours independent of small-cell and large-cell neuroendocrine carcinomas identified by gene expression profiles. Lancet. 2004, 363 (9411): 775-781. 10.1016\/S0140-6736(04)15693-6.","journal-title":"Lancet"},{"issue":"19","key":"4948_CR29","doi-asserted-by":"publisher","first-page":"10869","DOI":"10.1073\/pnas.191367098","volume":"98","author":"T Sorlie","year":"2001","unstructured":"Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001, 98 (19): 10869-10874. 10.1073\/pnas.191367098.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"4","key":"4948_CR30","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1038\/nm843","volume":"9","author":"QH Ye","year":"2003","unstructured":"Ye QH, Qin LX, Forgues M, He P, Kim JW, Peng AC, Simon R, Li Y, Robles AI, Chen Y: Predicting hepatitis B virus-positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning. Nat Med. 2003, 9 (4): 416-423. 10.1038\/nm843.","journal-title":"Nat Med"},{"key":"4948_CR31","first-page":"111","volume":"12","author":"S Dudoit","year":"2002","unstructured":"Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica. 2002, 12: 111-140.","journal-title":"Statistica Sinica"},{"issue":"4","key":"4948_CR32","doi-asserted-by":"publisher","first-page":"e15","DOI":"10.1093\/nar\/30.4.e15","volume":"30","author":"YH Yang","year":"2002","unstructured":"Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30 (4): e15-10.1093\/nar\/30.4.e15.","journal-title":"Nucleic Acids Res"},{"key":"4948_CR33","volume-title":"ScanAlyze. User manual","author":"MB Eisen","year":"1999","unstructured":"Eisen MB: ScanAlyze. User manual. 1999"},{"key":"4948_CR34","unstructured":"Scherer A, (ed.): Bioinformatic Strategies for cDNA-Microarray Data Processing. 2009, John Wiley & Sons, Ltd"},{"key":"4948_CR35","doi-asserted-by":"crossref","unstructured":"Aittokallio T: Dealing with missing values in large-scale studies: microarray data imputation and beyond. Brief Bioinform. 11 (2): 253-264.","DOI":"10.1093\/bib\/bbp059"},{"issue":"6","key":"4948_CR36","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","volume":"17","author":"O Troyanskaya","year":"2001","unstructured":"Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics. 2001, 17 (6): 520-525. 10.1093\/bioinformatics\/17.6.520.","journal-title":"Bioinformatics"},{"key":"4948_CR37","volume-title":"Pattern recognition","author":"S Theodoridis","year":"1999","unstructured":"Theodoridis S, Koutroumbas K: Pattern recognition. 1999, San Diego, Calif.: Academic Press"},{"key":"4948_CR38","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-21606-5","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"T Hastie","year":"2001","unstructured":"Hastie T, Tibshirani R, Friedman JH: The elements of statistical learning: data mining, inference, and prediction. 2001, New York: Springer"},{"key":"4948_CR39","volume-title":"Statistical inference","author":"G Casella","year":"2002","unstructured":"Casella G, Berger RL: Statistical inference. 2002, Pacific Grove, Calif.: Duxbury, 2","edition":"2"},{"key":"4948_CR40","first-page":"249","volume-title":"Proc 9th International Conference on Machine Learning: 1992","author":"KKaLA Rendell","year":"1992","unstructured":"Rendell KKaLA: A practical approach to feature selection. Proc 9th International Conference on Machine Learning: 1992. 1992, 249-256."},{"issue":"10","key":"4948_CR41","doi-asserted-by":"publisher","first-page":"e1000173","DOI":"10.1371\/journal.pcbi.1000173","volume":"4","author":"A Ben-Hur","year":"2008","unstructured":"Ben-Hur A, Ong CS, Sonnenburg S, Scholkopf B, Ratsch G: Support vector machines and kernels for computational biology. PLoS Comput Biol. 2008, 4 (10): e1000173-10.1371\/journal.pcbi.1000173.","journal-title":"PLoS Comput Biol"},{"key":"4948_CR42","doi-asserted-by":"crossref","unstructured":"Alexandros Karatzoglou DM, Hornik Kurt: Support Vector Machines in R. Journal of Statistical Software. 2006, 15 (9):","DOI":"10.18637\/jss.v015.i09"},{"key":"4948_CR43","volume-title":"Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models","author":"JJ Faraway","year":"2006","unstructured":"Faraway JJ: Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. 2006, Boca Raton: Chapman & Hall\/CRC"},{"key":"4948_CR44","volume-title":"Machine learning","author":"TM Mitchell","year":"1997","unstructured":"Mitchell TM: Machine learning. 1997, New York: McGraw-Hill"},{"key":"4948_CR45","volume-title":"Extending the Linear Model with R","author":"JJ Faraway","year":"2006","unstructured":"Faraway JJ: Extending the Linear Model with R. 2006, United State of America: Chapman & Hall\/CRC, First","edition":"First"},{"issue":"1","key":"4948_CR46","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1023\/B:AMAI.0000018580.96245.c6","volume":"41","author":"KS Laura Elena Raileanu","year":"2004","unstructured":"Laura Elena Raileanu KS: Theoretical Comparison between the Gini Index and Information Gain Criteria. Annals of Mathematics and Artificial Intelligence. 2004, 41 (1): 77-93.","journal-title":"Annals of Mathematics and Artificial Intelligence"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-390.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T17:51:23Z","timestamp":1630518683000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-390"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10,7]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4948"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-390","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,10,7]]},"assertion":[{"value":"20 December 2010","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 October 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 October 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"390"}}