{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T15:42:19Z","timestamp":1769269339697,"version":"3.49.0"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation.<\/jats:p><jats:p>Methods: Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure.<\/jats:p><jats:p>Results: DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation.<\/jats:p><jats:p>Contact: \u00a0ctseng@pitt.edu; guy.brock@louisville.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq613","type":"journal-article","created":{"date-parts":[[2010,11,3]],"date-time":"2010-11-03T01:07:20Z","timestamp":1288746440000},"page":"78-86","source":"Crossref","is-referenced-by-count":47,"title":["Biological impact of missing-value imputation on downstream analyses of gene expression profiles"],"prefix":"10.1093","volume":"27","author":[{"given":"Sunghee","family":"Oh","sequence":"first","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 2Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 3Department of Computational Biology and 4Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"}]},{"given":"Dongwan D.","family":"Kang","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 2Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 3Department of Computational Biology and 4Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"}]},{"given":"Guy N.","family":"Brock","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 2Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 3Department of Computational Biology and 4Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"}]},{"given":"George C.","family":"Tseng","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 2Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 3Department of Computational Biology and 4Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"},{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 2Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 3Department of Computational Biology and 4Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"},{"name":"1 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 2Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 3Department of Computational Biology and 4Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,11,2]]},"reference":[{"key":"2023012511154767800_B1","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1093\/bib\/bbp059","article-title":"Dealing with missing values in large-scale studies: microarray data imputation and beyond","volume":"2","author":"Aittokallio","year":"2010","journal-title":"Brief. Bioinformatics"},{"key":"2023012511154767800_B2","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/35000501","article-title":"Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling","volume":"403","author":"Alizadeh","year":"2000","journal-title":"Nature"},{"key":"2023012511154767800_B3","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B4","doi-asserted-by":"crossref","first-page":"790","DOI":"10.1073\/pnas.191502998","article-title":"Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma sublcasses","volume":"98","author":"Bhattacharjee","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B5","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/nm733","article-title":"Gene-expression profiles predict survival of patients with lung adenocarcinoma","volume":"8","author":"Beer","year":"2002","journal-title":"Nat. Med."},{"key":"2023012511154767800_B6","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate - a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B"},{"key":"2023012511154767800_B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gnh026","article-title":"LSimpute: accurate estimation of missing values in microarray data with least squares methods","volume":"32","author":"Bo","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023012511154767800_B8","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/bioinformatics\/19.2.185","article-title":"A comparison of normalization methods for high density oligonucleotide array data based on bias and varicance","volume":"19","author":"Bolstad","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-9-12","article-title":"Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes","volume":"9","author":"Brock","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012511154767800_B10","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1091\/mbc.12.2.323","article-title":"Remodeling of yeast genome expression in response to environmental changes","volume":"12","author":"Causton","year":"2001","journal-title":"Mol. Biol. Cell"},{"key":"2023012511154767800_B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-11-15","article-title":"Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments","volume":"11","author":"Celton","year":"2010","journal-title":"BMC Genomics"},{"key":"2023012511154767800_B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-4-59","article-title":"Cross-platform comparison and visualization of gene expression data using co-inertia analysis","volume":"4","author":"Culhane","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012511154767800_B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-5-114","article-title":"Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering","volume":"5","author":"de Brevern","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012511154767800_B14","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B15","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","article-title":"The use of multiple measurements in taxonomic problems","volume":"7","author":"Fisher","year":"1936","journal-title":"Ann Eugen."},{"key":"2023012511154767800_B16","doi-asserted-by":"crossref","DOI":"10.1037\/e471672008-001","article-title":"Discriminatory analysis, nonparametric discrimination: consistency properties","volume-title":"Technical Report 4","author":"Fix","year":"1951"},{"key":"2023012511154767800_B17","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012511154767800_B18","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023012511154767800_B19","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"2023012511154767800_B20","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/77116","article-title":"Widespread aneuploidy revealed by DNA microarray expression profiling","volume":"25","author":"Hughes","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012511154767800_B21","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1089\/10665270050514954","article-title":"Analysis of variance for gene expression microarray data","volume":"7","author":"Kerr","year":"2000","journal-title":"J. Comput. Biol."},{"key":"2023012511154767800_B22","doi-asserted-by":"crossref","first-page":"1990","DOI":"10.1093\/bioinformatics\/btq323","article-title":"Over-optimism in bioinformatics: an illustration","volume":"21","author":"Jelizarow","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B23","doi-asserted-by":"crossref","first-page":"4155","DOI":"10.1093\/bioinformatics\/bti638","article-title":"DNA microarray data imputation and significance analysis of differential expression","volume":"21","author":"Jornsten","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B24","doi-asserted-by":"crossref","first-page":"1410","DOI":"10.1093\/bioinformatics\/btk053","article-title":"Missing value estimation for DNA microarray gene expression data: local least squares imputation","volume":"22","author":"Kim","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B25","first-page":"179","article-title":"Self-organizing maps of massive databases","volume":"9","author":"Kohonen","year":"2001","journal-title":"Eng. Intell. Syst. Elect. Eng. Commun."},{"key":"2023012511154767800_B26","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1073\/pnas.0304146101","article-title":"Gene expression profiling identifies clinically relevant subtypes of prostate cancer","volume":"101","author":"Lapointe","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B27","first-page":"4683","article-title":"Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling","volume":"61","author":"Luo","year":"2001","journal-title":"Cancer Res."},{"key":"2023012511154767800_B28","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/S0925-2312(03)00431-4","article-title":"The support vector machine under test","volume":"55","author":"Meyer","year":"2003","journal-title":"Neruocomputing"},{"key":"2023012511154767800_B29","doi-asserted-by":"crossref","first-page":"347","DOI":"10.6339\/JDS.2004.02(4).170","article-title":"Evaluation of missing value estimation for microarray data","volume":"2","author":"Nguyen","year":"2004","journal-title":"J. Data Sci."},{"key":"2023012511154767800_B30","doi-asserted-by":"crossref","first-page":"2088","DOI":"10.1093\/bioinformatics\/btg287","article-title":"A Bayesian missing value estimation method for gene expression profile data","volume":"19","author":"Oba","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B31","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1093\/bioinformatics\/bth007","article-title":"Gaussian mixture clustering and imputation of microarray data","volume":"20","author":"Ouyang","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B32","doi-asserted-by":"crossref","first-page":"4272","DOI":"10.1093\/bioinformatics\/bti708","article-title":"The influence of missing value imputation on detection of differentially expressed genes from microarray data","volume":"21","author":"Scheel","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B33","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1016\/j.jbi.2007.10.005","article-title":"Ameliorative missing value imputation for robust biological knowledge inference","volume":"41","author":"Sehgal","year":"2008","journal-title":"J. Biomed. Inform."},{"key":"2023012511154767800_B34","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","article-title":"Gene expression correlates of clinical prostate cancer behavior","volume":"1","author":"Singh","year":"2002","journal-title":"Cancer Cell"},{"key":"2023012511154767800_B35","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1027","article-title":"Linear models and empirical bayes methods for assessing differential expression in microarray experiments","volume":"3","author":"Smyth","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012511154767800_B36","doi-asserted-by":"crossref","first-page":"3273","DOI":"10.1091\/mbc.9.12.3273","article-title":"Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization","volume":"9","author":"Spellman","year":"1998","journal-title":"Mol. Biol. Cell"},{"key":"2023012511154767800_B37","doi-asserted-by":"crossref","first-page":"1164","DOI":"10.1093\/bioinformatics\/btm069","article-title":"pcaMethods\u2014a bioconductor package providing PCA methods for incomplete data","volume":"23","author":"Stacklies","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B38","doi-asserted-by":"crossref","first-page":"10787","DOI":"10.1073\/pnas.191368598","article-title":"Chemosensitivity prediction by transcriptional profiling","volume":"98","author":"Staunton","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B39","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B40","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B41","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1093\/bioinformatics\/btk019","article-title":"Improving missing value estimation in microarray data with gene ontology","volume":"22","author":"Tuikkala","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-9-202","article-title":"Missing value imputation improves clustering and interpretation of gene epxression microarray data","volume":"9","author":"Tuikkala","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012511154767800_B43","doi-asserted-by":"crossref","first-page":"5116","DOI":"10.1073\/pnas.091062498","article-title":"Significance analysis of microarrays applied to the ionizing radiation response","volume":"98","author":"Tusher","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511154767800_B44","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1038\/415530a","article-title":"Gene expression profiling predicts clinical outcome of breast cancer","volume":"415","author":"Van't Veer","year":"2002","journal-title":"Nature"},{"key":"2023012511154767800_B45","doi-asserted-by":"crossref","first-page":"2883","DOI":"10.1093\/bioinformatics\/btl339","article-title":"Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules","volume":"22","author":"Wang","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511154767800_B46","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/S1535-6108(02)00032-6","article-title":"Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling","volume":"1","author":"Yeoh","year":"2002","journal-title":"Cancer Cell"},{"key":"2023012511154767800_B47","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1002\/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3","article-title":"Index for rating diagnostic tests","volume":"3","author":"Youden","year":"1950","journal-title":"Cancer"},{"key":"2023012511154767800_B48","doi-asserted-by":"crossref","first-page":"2790","DOI":"10.1200\/JCO.2004.05.158","article-title":"Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy","volume":"22","author":"Yu","year":"2004","journal-title":"J. Clin. Oncol."},{"key":"2023012511154767800_B49","doi-asserted-by":"crossref","first-page":"2057","DOI":"10.1093\/bioinformatics\/btn365","article-title":"Apparently low reproducibility of true differential expression discoveries in microarray studies","volume":"24","author":"Zhang","year":"2008","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/1\/78\/48861029\/bioinformatics_27_1_78.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/1\/78\/48861029\/bioinformatics_27_1_78.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T14:42:44Z","timestamp":1740667364000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/1\/78\/201857"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,11,2]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq613","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,1,1]]},"published":{"date-parts":[[2010,11,2]]}}}