{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T01:54:58Z","timestamp":1767664498816},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Feature selection methods aim to reduce the complexity of data and to uncover the most relevant biological variables. In reality, information in biological datasets is often incomplete as a result of untrustworthy samples and missing values. The reliability of selection methods may therefore be questioned.<\/jats:p><jats:p>Method: Information loss is incorporated into a perturbation scheme, testing which features are stable under it. This method is applied to data analysis by unsupervised feature filtering (UFF). The latter has been shown to be a very successful method in analysis of gene-expression data.<\/jats:p><jats:p>Results: We find that the UFF quality degrades smoothly with information loss. It remains successful even under substantial damage. Our method allows for selection of a best imputation method on a dataset treated by UFF. More importantly, scoring features according to their stability under information loss is shown to be correlated with biological importance in cancer studies. This scoring may lead to novel biological insights.<\/jats:p><jats:p>Contact: \u00a0royke@cs.huji.ac.il<\/jats:p><jats:p>Supplementary information and code availability: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm528","type":"journal-article","created":{"date-parts":[[2007,11,8]],"date-time":"2007-11-08T01:26:33Z","timestamp":1194485193000},"page":"3343-3349","source":"Crossref","is-referenced-by-count":19,"title":["Unsupervised feature selection under perturbations: meeting the challenges of biological data"],"prefix":"10.1093","volume":"23","author":[{"given":"Roy","family":"Varshavsky","sequence":"first","affiliation":[{"name":"1 School of Computer Science and Engineering, The Hebrew University of Jerusalem 91904, 2School of Physics and Astronomy, Tel Aviv University 69978 and 3Deptartment of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem 91904, Israel"}]},{"given":"Assaf","family":"Gottlieb","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Engineering, The Hebrew University of Jerusalem 91904, 2School of Physics and Astronomy, Tel Aviv University 69978 and 3Deptartment of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem 91904, Israel"}]},{"given":"David","family":"Horn","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Engineering, The Hebrew University of Jerusalem 91904, 2School of Physics and Astronomy, Tel Aviv University 69978 and 3Deptartment of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem 91904, Israel"}]},{"given":"Michal","family":"Linial","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Engineering, The Hebrew University of Jerusalem 91904, 2School of Physics and Astronomy, Tel Aviv University 69978 and 3Deptartment of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem 91904, Israel"}]}],"member":"286","published-online":{"date-parts":[[2007,11,7]]},"reference":[{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"10101","DOI":"10.1073\/pnas.97.18.10101","article-title":"Singular value decomposition for genome-wide expression data processing and modeling","volume":"97","author":"Alter","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/nm733","article-title":"Gene-expression profiles predict survival of patients with lung adenocarcinoma","volume":"8","author":"Beer","year":"2002","journal-title":"Nat. Med"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2002-3-4-research0017","article-title":"New feature subset selection procedures for classification of expression profiles","volume":"3","author":"B\u00f8","year":"2002","journal-title":"Genome Biol"},{"key":"2023041107271237500_","first-page":"237","article-title":"Noise-based feature perturbation as a selection method for microarray data","volume-title":"ISBRA","author":"Chen","year":"2007"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1186\/1471-2105-5-114","article-title":"Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering","volume":"5","author":"de Brevern","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023041107271237500_","first-page":"845","article-title":"Feature selection for unsupervised learning","volume":"5","author":"Dy","year":"2004","journal-title":"J. Mach. Learn. Res"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"5923","DOI":"10.1073\/pnas.0601231103","article-title":"Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer","volume":"103","author":"Ein-Dor","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"1608","DOI":"10.1093\/nar\/gkl047","article-title":"Microarray missing data imputation based on a set theoretic framework and biological knowledge","volume":"34","author":"Gan","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023041107271237500_","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1093\/bioinformatics\/btg040","article-title":"Gene expression data preprocessing","volume":"19","author":"Herrero","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"1110","DOI":"10.1093\/bioinformatics\/btg053","article-title":"Novel clustering algorithm for microarray expression data in a truncated SVD space","volume":"19","author":"Horn","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1093\/bioinformatics\/btm010","article-title":"An ensemble approach to microarray data-based gene prioritization after missing value imputation","volume":"23","author":"Hua","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1038\/89044","article-title":"Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks","volume":"7","author":"Khan","year":"2001","journal-title":"Nat. Med"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/1745-6150-2-9","article-title":"How high is the level of technical noise in microarray data?","volume":"2","author":"Klebanov","year":"2007","journal-title":"Biol. Direct"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"4272","DOI":"10.1093\/bioinformatics\/bti708","article-title":"The influence of missing value imputation on detection of differentially expressed genes from microarray data","volume":"21","author":"Scheel","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1038\/nbt1239","article-title":"The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements","volume":"24","author":"Shi","year":"2006","journal-title":"Nat. Biotechnol"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"4232","DOI":"10.1038\/sj.onc.1208601","article-title":"Rare amplicons implicate frequent deregulation of cell fate specification pathways in oral squamous cell carcinoma","volume":"24","author":"Snijders","year":"2005","journal-title":"Oncogene"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1093\/bioinformatics\/btk019","article-title":"Improving missing value estimation in microarray data with gene ontology","volume":"22","author":"Tuikkala","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"e507","DOI":"10.1093\/bioinformatics\/btl214","article-title":"Novel unsupervised feature filtering of biological data","volume":"22","author":"Varshavsky","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"D358","DOI":"10.1093\/nar\/gkl825","article-title":"STRING 7 \u2013 recent developments in the integration and prediction of protein interactions","volume":"35","author":"Mering","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023041107271237500_","doi-asserted-by":"crossref","first-page":"2883","DOI":"10.1093\/bioinformatics\/btl339","article-title":"Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene-expression profiles and functional modules","volume":"22","author":"Wang","year":"2006","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/24\/3343\/49820790\/bioinformatics_23_24_3343.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/24\/3343\/49820790\/bioinformatics_23_24_3343.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T17:17:53Z","timestamp":1684084673000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/24\/3343\/1745875"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,11,7]]},"references-count":22,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2007,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm528","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,12,15]]},"published":{"date-parts":[[2007,11,7]]}}}