{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:35:14Z","timestamp":1761597314224},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options.<\/jats:p>\n               <jats:p>Results: We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer.<\/jats:p>\n               <jats:p>Availability: \u00a0FSR was implemented in MATLAB R2010b and is available at http:\/\/ww2.cs.mu.oz.au\/~gwong\/FSR<\/jats:p>\n               <jats:p>Contact: \u00a0gwong@csse.unimelb.edu.au<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available from Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr644","type":"journal-article","created":{"date-parts":[[2011,11,23]],"date-time":"2011-11-23T02:11:40Z","timestamp":1322014300000},"page":"151-159","source":"Crossref","is-referenced-by-count":11,"title":["FSR: feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number"],"prefix":"10.1093","volume":"28","author":[{"given":"Gerard","family":"Wong","sequence":"first","affiliation":[{"name":"1 National ICT Australia, Victoria Research Laboratory, Parkville, 2Department of Computer Science and Software Engineering, The University of Melbourne, Carlton and 3Department of Electrical Engineering, The University of Melbourne, Parkville, Australia"},{"name":"1 National ICT Australia, Victoria Research Laboratory, Parkville, 2Department of Computer Science and Software Engineering, The University of Melbourne, Carlton and 3Department of Electrical Engineering, The University of Melbourne, Parkville, Australia"}]},{"given":"Christopher","family":"Leckie","sequence":"additional","affiliation":[{"name":"1 National ICT Australia, Victoria Research Laboratory, Parkville, 2Department of Computer Science and Software Engineering, The University of Melbourne, Carlton and 3Department of Electrical Engineering, The University of Melbourne, Parkville, Australia"},{"name":"1 National ICT Australia, Victoria Research Laboratory, Parkville, 2Department of Computer Science and Software Engineering, The University of Melbourne, Carlton and 3Department of Electrical Engineering, The University of Melbourne, Parkville, Australia"}]},{"given":"Adam","family":"Kowalczyk","sequence":"additional","affiliation":[{"name":"1 National ICT Australia, Victoria Research Laboratory, Parkville, 2Department of Computer Science and Software Engineering, The University of Melbourne, Carlton and 3Department of Electrical Engineering, The University of Melbourne, Parkville, Australia"},{"name":"1 National ICT Australia, Victoria Research Laboratory, Parkville, 2Department of Computer Science and Software Engineering, The University of Melbourne, Carlton and 3Department of Electrical Engineering, The University of Melbourne, Parkville, Australia"}]}],"member":"286","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"2023012511342956200_B1","doi-asserted-by":"crossref","first-page":"D885","DOI":"10.1093\/nar\/gkn764","article-title":"NCBI GEO: archive for high-throughput functional genomic data","volume":"37","author":"Barrett","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012511342956200_B2","doi-asserted-by":"crossref","first-page":"1765","DOI":"10.1016\/S0002-9440(10)63536-5","article-title":"Classifying melanocytic tumors based on DNA copy number changes","volume":"163","author":"Bastian","year":"2003","journal-title":"Am. J. Pathol."},{"key":"2023012511342956200_B3","doi-asserted-by":"crossref","first-page":"i139","DOI":"10.1093\/bioinformatics\/btn272","article-title":"A fast and flexible method for the segmentation of aCGH data","volume":"24","author":"Ben-Yaacov","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012511342956200_B4","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1002\/gcc.20366","article-title":"Distinct patterns of dna copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer","volume":"45","author":"Bergamaschi","year":"2006","journal-title":"Genes Chromosomes Cancer"},{"key":"2023012511342956200_B5","first-page":"265","article-title":"On the algorithmic implementation of multiclass kernel-based vector machines","volume":"2","author":"Crammer","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"2023012511342956200_B6","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1198\/016214502753479248","article-title":"Comparison of discrimination methods for the classification of tumors using gene expression data","volume":"97","author":"Dudoit","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012511342956200_B7","first-page":"1871","article-title":"LIBLINEAR: A library for large linear classification","volume":"9","author":"Fan","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"2023012511342956200_B8","doi-asserted-by":"crossref","first-page":"4731","DOI":"10.1158\/1078-0432.CCR-07-0502","article-title":"High-resolution single nucleotide polymorphism array analysis of epithelial ovarian cancer reveals numerous microdeletions and amplifications","volume":"13","author":"Gorringe","year":"2007","journal-title":"Clin. Cancer Res."},{"key":"2023012511342956200_B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1756-9966-28-103","article-title":"Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method","volume":"28","author":"Guan","year":"2009","journal-title":"J. Exp. Clin. Cancer Res."},{"key":"2023012511342956200_B10","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.3324\/haematol.2010.039768","article-title":"Array-based genomic screening at diagnosis and follow-up in chronic lymphocytic leukemia","volume":"96","author":"Gunnarsson","year":"2011","journal-title":"Haematologica"},{"key":"2023012511342956200_B11","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/1755-8794-2-21","article-title":"High-resolution analysis of copy number alterations and associated expression changes in ovarian tumors","volume":"2","author":"Haverty","year":"2009","journal-title":"BMC Med. Genomics"},{"key":"2023012511342956200_B12","doi-asserted-by":"crossref","first-page":"2542","DOI":"10.1158\/0008-5472.CAN-04-3247","article-title":"Genome-wide association study in esophageal cancer using GeneChip mapping 10K array","volume":"65","author":"Hu","year":"2005","journal-title":"Cancer Res."},{"key":"2023012511342956200_B13","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1016\/j.exphem.2009.04.012","article-title":"Identified hidden genomic changes in mantle cell lymphoma using high-resolution single nucleotide polymorphism genomic array","volume":"37","author":"Kawamata","year":"2009","journal-title":"Exp. Hematol."},{"key":"2023012511342956200_B14","first-page":"171","article-title":"Estimating attributes: Analysis and extensions of relief","volume-title":"European Conference on Machine Learning.","author":"Kononenko","year":"1994"},{"key":"2023012511342956200_B15","first-page":"13","article-title":"Genome-wide copy number variation analysis in attention-deficit \/ hyperactivity disorder: association with neuropeptide Y gene dosage in an extended pedigree","volume":"1","author":"Lesch","year":"2010","journal-title":"Mol. Psychiatry"},{"key":"2023012511342956200_B16","doi-asserted-by":"crossref","first-page":"2494","DOI":"10.1093\/hmg\/ddm205","article-title":"Evaluation of genome-wide power of genetic association studies based on empirical data from the HapMap project","volume":"16","author":"Nannya","year":"2007","journal-title":"Hum. Mol. Genet."},{"key":"2023012511342956200_B17","first-page":"5352","article-title":"Array comparative genome hybridization for tumor classification and gene discovery in mouse models of malignant melanoma","volume":"63","author":"O'Hagan","year":"2003","journal-title":"Cancer Res."},{"key":"2023012511342956200_B18","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1186\/1471-2105-7-320","article-title":"Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data","volume":"7","author":"Ooi","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012511342956200_B19","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023012511342956200_B20","doi-asserted-by":"crossref","first-page":"e9983","DOI":"10.1371\/journal.pone.0009983","article-title":"Identification of candidate growth promoting genes in ovarian cancer through integrated copy number and expression analysis","volume":"5","author":"Ramakrishna","year":"2010","journal-title":"PLoS One"},{"key":"2023012511342956200_B21","doi-asserted-by":"crossref","first-page":"1595","DOI":"10.1182\/blood-2010-01-264275","article-title":"Genome-wide DNA profiling of marginal zone lymphomas identifies subtype-specific lesions with an impact on the clinical outcome","volume":"117","author":"Rinaldi","year":"2011","journal-title":"Blood"},{"key":"2023012511342956200_B22","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012511342956200_B23","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1371\/journal.pone.0000958","article-title":"Effects of environment, genetics and data analysis pitfalls in an esophageal cancer genome-wide association study","volume":"2","author":"Statnikov","year":"2007","journal-title":"PloS One"},{"key":"2023012511342956200_B24","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1038\/nature02168","article-title":"The International HapMap Project","volume":"426","author":"The International HapMap Consortium","year":"2003","journal-title":"Nature"},{"key":"2023012511342956200_B25","first-page":"248","article-title":"An interval tree based feature reduction method for cancer classification using high-throughput DNA copy number data","volume-title":"International Conference on Bioinformatics and Computational Biology, BIOCOMP","author":"Wang","year":"2007"},{"key":"2023012511342956200_B26","first-page":"1057","article-title":"Tumor classification based on DNA copy number aberrations determined using SNP arrays","volume":"15","author":"Wang","year":"2006","journal-title":"Oncol. Rep."},{"key":"2023012511342956200_B27","doi-asserted-by":"crossref","first-page":"5864","DOI":"10.1109\/IEMBS.2006.260116","article-title":"Cancer classification using loss of heterozygosity data derived from single-nucleotide polymorphism genotyping arrays","volume-title":"28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2006. EMBS'06","author":"Wang","year":"2006"},{"key":"2023012511342956200_B28","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1093\/bioinformatics\/btp708","article-title":"CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data","volume":"26","author":"Zhang","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012511342956200_B29","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1158\/1940-6207.CAPR-08-0233","article-title":"Noninvasive detection of candidate molecular biomarkers in subjects with a history of insulin resistance and colorectal adenomas","volume":"2","author":"Zhao","year":"2009","journal-title":"Cancer Prev. Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/2\/151\/48868877\/bioinformatics_28_2_151.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/2\/151\/48868877\/bioinformatics_28_2_151.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:38:34Z","timestamp":1674646714000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/2\/151\/198965"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,11,21]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr644","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,1,15]]},"published":{"date-parts":[[2011,11,21]]}}}