{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T14:22:16Z","timestamp":1761488536259},"reference-count":33,"publisher":"World Scientific Pub Co Pte Lt","issue":"06","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Bioinform. Comput. Biol."],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:p> Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (). <\/jats:p>","DOI":"10.1142\/s0219720010005063","type":"journal-article","created":{"date-parts":[[2010,9,1]],"date-time":"2010-09-01T09:25:58Z","timestamp":1283333158000},"page":"945-965","source":"Crossref","is-referenced-by-count":12,"title":["MULTI-FACTORIAL ANALYSIS OF CLASS PREDICTION ERROR: ESTIMATING OPTIMAL NUMBER OF BIOMARKERS FOR VARIOUS CLASSIFICATION RULES"],"prefix":"10.1142","volume":"08","author":[{"given":"MIZANUR R.","family":"KHONDOKER","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Institute of Psychiatry and NIHR Biomedical, Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust, King's College London, Box P020, De Crespigny Park, London SE5 8AF, UK"}]},{"given":"TILL T.","family":"BACHMANN","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"}]},{"given":"MURIEL","family":"MEWISSEN","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"}]},{"given":"PAUL","family":"DICKINSON","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"},{"name":"Centre for Systems Biology at Edinburgh, The University of Edinburgh, Darwin Building, King's Buildings Campus, Mayfield Road, Edinburgh EH9 3JU, UK"}]},{"given":"BARTOSZ","family":"DOBRZELECKI","sequence":"additional","affiliation":[{"name":"EPCC, The University of Edinburgh, James Clerk Maxwell Building, Mayfield Road, Edinburgh EH9 3JZ, UK"}]},{"given":"COLIN J.","family":"CAMPBELL","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"}]},{"given":"ANDREW R.","family":"MOUNT","sequence":"additional","affiliation":[{"name":"School of Chemistry, The University of Edinburgh, Joseph Black Building, West Mains Road, Edinburgh, EH9 3JJ, UK"}]},{"given":"ANTHONY J.","family":"WALTON","sequence":"additional","affiliation":[{"name":"Institute for Integrated Micro and Nano Systems, Joint Research Institute for Integrated Systems and Scottish Microelectronics Centre, School of Engineering, The University of Edinburgh, The King's Buildings, Edinburgh EH9 3JF, UK"},{"name":"Centre for Systems Biology at Edinburgh, The University of Edinburgh, Darwin Building, King's Buildings Campus, Mayfield Road, Edinburgh EH9 3JU, UK"}]},{"given":"JASON","family":"CRAIN","sequence":"additional","affiliation":[{"name":"School of Physics and Astronomy, The University of Edinburgh, The King's Buildings, West Mains Road, Edinburgh EH9 3JZ, UK"},{"name":"National Physical Laboratory, Hampton Road, Teddington, Middlesex TW11 0LW, UK"}]},{"given":"HOLGER","family":"SCHULZE","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"}]},{"given":"GERARD","family":"GIRAUD","sequence":"additional","affiliation":[{"name":"School of Physics and Astronomy, The University of Edinburgh, The King's Buildings, West Mains Road, Edinburgh EH9 3JZ, UK"}]},{"given":"ALAN J.","family":"ROSS","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"}]},{"given":"ILENIA","family":"CIANI","sequence":"additional","affiliation":[{"name":"School of Chemistry, The University of Edinburgh, Joseph Black Building, West Mains Road, Edinburgh, EH9 3JJ, UK"}]},{"given":"STUART W. J.","family":"EMBER","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"}]},{"given":"CHAKER","family":"TLILI","sequence":"additional","affiliation":[{"name":"School of Chemistry, The University of Edinburgh, Joseph Black Building, West Mains Road, Edinburgh, EH9 3JJ, UK"}]},{"given":"JONATHAN G.","family":"TERRY","sequence":"additional","affiliation":[{"name":"Institute for Integrated Micro and Nano Systems, Joint Research Institute for Integrated Systems and Scottish Microelectronics Centre, School of Engineering, The University of Edinburgh, The King's Buildings, Edinburgh EH9 3JF, UK"}]},{"given":"EILIDH","family":"GRANT","sequence":"additional","affiliation":[{"name":"EPCC, The University of Edinburgh, James Clerk Maxwell Building, Mayfield Road, Edinburgh EH9 3JZ, UK"}]},{"given":"NICOLA","family":"McDONNELL","sequence":"additional","affiliation":[{"name":"EPCC, The University of Edinburgh, James Clerk Maxwell Building, Mayfield Road, Edinburgh EH9 3JZ, UK"}]},{"given":"PETER","family":"GHAZAL","sequence":"additional","affiliation":[{"name":"Division of Pathway Medicine (DPM), The University of Edinburgh Medical School, The Chancellor's Building, 49 Little France Crescent, Edinburgh EH16 4SB, UK"},{"name":"Centre for Systems Biology at Edinburgh, The University of Edinburgh, Darwin Building, King's Buildings Campus, Mayfield Road, Edinburgh EH9 3JU, UK"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf1","doi-asserted-by":"publisher","DOI":"10.1126\/science.286.5439.531"},{"key":"rf2","doi-asserted-by":"publisher","DOI":"10.1038\/35000501"},{"key":"rf3","doi-asserted-by":"publisher","DOI":"10.1038\/89044"},{"key":"rf4","doi-asserted-by":"publisher","DOI":"10.1038\/415530a"},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0401994101"},{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0409462102"},{"key":"rf7","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1124\/mol.60.6.1189","volume":"60","author":"Thomas R. S.","journal-title":"Mol. Pharmacol."},{"key":"rf8","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-4274(01)00267-3"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1632587100"},{"key":"rf10","doi-asserted-by":"publisher","DOI":"10.1101\/gr.2807605"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1182\/blood-2006-02-002477"},{"key":"rf12","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/17.12.1131"},{"key":"rf13","first-page":"1471","volume":"7","author":"D\u00edaz-Uriarte R.","journal-title":"BMC Bioinformatics"},{"key":"rf14","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(79)90036-0"},{"key":"rf15","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2004.08.007"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti171"},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"rf18","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0"},{"key":"rf19","first-page":"273","volume":"20","author":"Cortes C.","journal-title":"Machine Learning"},{"key":"rf20","volume-title":"Statistical Learning Theory","author":"Vapnik V. N.","year":"1998"},{"key":"rf21","doi-asserted-by":"publisher","DOI":"10.1111\/j.1469-1809.1936.tb02137.x"},{"key":"rf22","doi-asserted-by":"publisher","DOI":"10.1002\/0471725293"},{"key":"rf23","volume-title":"Pattern Classification","author":"Duda R. O.","year":"2001"},{"key":"rf24","volume-title":"Nearest Neighbor (NN) Norms:NN Pattern Classification Techniques","author":"Dasarathy B. V.","year":"1991"},{"key":"rf25","volume-title":"Nearest-neighbor Methods in Learning and Vision:Theory and Practice","author":"Shakhnarovich G.","year":"2005"},{"key":"rf26","volume-title":"R:A Language and Environment for Statistical Computing","year":"2009"},{"key":"rf27","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1983.10477973"},{"key":"rf28","doi-asserted-by":"publisher","DOI":"10.1039\/b707122c"},{"key":"rf29","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1968.1054102"},{"key":"rf30","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0601231103"},{"key":"rf31","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btn365"},{"key":"rf32","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp295"},{"key":"rf33","volume":"3","author":"Smyth G. K.","journal-title":"Statistical Applications in Genetics and Molecular Biology"}],"container-title":["Journal of Bioinformatics and Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219720010005063","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T02:39:16Z","timestamp":1565145556000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219720010005063"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,12]]},"references-count":33,"journal-issue":{"issue":"06","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2010,12]]}},"alternative-id":["10.1142\/S0219720010005063"],"URL":"https:\/\/doi.org\/10.1142\/s0219720010005063","relation":{},"ISSN":["0219-7200","1757-6334"],"issn-type":[{"value":"0219-7200","type":"print"},{"value":"1757-6334","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,12]]}}}