{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T21:38:13Z","timestamp":1768081093291,"version":"3.49.0"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Significance analysis of differential expression in DNA microarray data is an important task. Much of the current research is focused on developing improved tests and software tools. The task is difficult not only owing to the high dimensionality of the data (number of genes), but also because of the often non-negligible presence of missing values. There is thus a great need to reliably impute these missing values prior to the statistical analyses. Many imputation methods have been developed for DNA microarray data, but their impact on statistical analyses has not been well studied. In this work we examine how missing values and their imputation affect significance analysis of differential expression.<\/jats:p><jats:p>Results: We develop a new imputation method (LinCmb) that is superior to the widely used methods in terms of normalized root mean squared error. Its estimates are the convex combinations of the estimates of existing methods. We find that LinCmb adapts to the structure of the data: If the data are heterogeneous or if there are few missing values, LinCmb puts more weight on local imputation methods; if the data are homogeneous or if there are many missing values, LinCmb puts more weight on global imputation methods. Thus, LinCmb is a useful tool to understand the merits of different imputation methods. We also demonstrate that missing values affect significance analysis. Two datasets, different amounts of missing values, different imputation methods, the standard t-test and the regularized t-test and ANOVA are employed in the simulations. We conclude that good imputation alleviates the impact of missing values and should be an integral part of microarray data analysis. The most competitive methods are LinCmb, GMC and BPCA. Popular imputation schemes such as SVD, row mean, and KNN all exhibit high variance and poor performance. The regularized t-test is less affected by missing values than the standard t-test.<\/jats:p><jats:p>Availability: Matlab code is available on request from the authors.<\/jats:p><jats:p>Contact: \u00a0rebecka@stat.rutgers.edu; ouyangmi@umdnj.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti638","type":"journal-article","created":{"date-parts":[[2005,8,24]],"date-time":"2005-08-24T02:43:26Z","timestamp":1124851406000},"page":"4155-4161","source":"Crossref","is-referenced-by-count":95,"title":["DNA microarray data imputation and significance analysis of differential expression"],"prefix":"10.1093","volume":"21","author":[{"given":"Rebecka","family":"J\u00f6rnsten","sequence":"first","affiliation":[{"name":"Department of Statistics, Rutgers, the State University of New Jersey 1 \u00a0 1 \u00a0 \u00a0 New Brunswick, NJ 08903, USA"}]},{"given":"Hui-Yu","family":"Wang","sequence":"additional","affiliation":[{"name":"2 \u00a02 \u00a09 Stoecker Road, Holmdel, NJ 07733, USA"}]},{"given":"William J.","family":"Welsh","sequence":"additional","affiliation":[{"name":"Department of Pharmacology, Robert Wood Johnson Medical School, and Informatics Institute, University of Medicine and Dentistry of New Jersey 3 \u00a0 3 \u00a0 \u00a0 Piscataway, NJ 08854, USA"}]},{"given":"Ming","family":"Ouyang","sequence":"additional","affiliation":[{"name":"Department of Pharmacology, Robert Wood Johnson Medical School, and Informatics Institute, University of Medicine and Dentistry of New Jersey 3 \u00a0 3 \u00a0 \u00a0 Piscataway, NJ 08854, USA"}]}],"member":"286","published-online":{"date-parts":[[2005,8,23]]},"reference":[{"key":"2023061007105161100_b1","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1093\/bioinformatics\/17.6.509","article-title":"A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes","volume":"17","author":"Baldi","year":"2001","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b2","doi-asserted-by":"crossref","first-page":"803","DOI":"10.2307\/2532201","article-title":"Model-based Gaussian and non-Gaussian clustering","volume":"49","author":"Banfield","year":"1993","journal-title":"Biometrics"},{"key":"2023061007105161100_b3","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1089\/10665270360688057","article-title":"Continuous representations of time-series gene expression data","volume":"10","author":"Bar-Joseph","year":"2003","journal-title":"J. Comput. Biol."},{"key":"2023061007105161100_b4","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a pratical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B."},{"key":"2023061007105161100_b5","doi-asserted-by":"crossref","first-page":"e34","DOI":"10.1093\/nar\/gnh026","article-title":"LSimpute: accurate estimation of missing values in microarray data with least squares methods","volume":"32","author":"B\u00f8","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023061007105161100_b6","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1038\/4462","article-title":"Exploring the new world of the genome with DNA microarrays","volume":"21","author":"Brown","year":"1999","journal-title":"Nat. Genet."},{"key":"2023061007105161100_b7","doi-asserted-by":"crossref","first-page":"1929","DOI":"10.1091\/mbc.02-02-0023","article-title":"Gene expression patterns in human liver cancers","volume":"13","author":"Chen","year":"2002","journal-title":"Mol. Biol. Cell."},{"key":"2023061007105161100_b8","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1186\/gb-2003-4-4-210","article-title":"Statistical tests for differential expression in cDNA microarray experiments","volume":"4","author":"Cui","year":"2003","journal-title":"Genome Biol."},{"key":"2023061007105161100_b9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm (with discussion)","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. B"},{"key":"2023061007105161100_b10","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023061007105161100_b11","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1093\/nar\/gkg078","article-title":"The Stanford Microarray Database: data access and quality assessment tools","volume":"31","author":"Gollub","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023061007105161100_b12","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21606-5","volume-title":"The elements of statistical learning","author":"Hastie","year":"2001"},{"key":"2023061007105161100_b13","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1093\/bioinformatics\/bth499","article-title":"Missing value estimation for DNA microarray gene expression data: local least squares imputation","volume":"21","author":"Kim","year":"2005","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b14","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1038\/4447","article-title":"High density synthetic oligonucleotide arrays","volume":"21","author":"Lipshutz","year":"1999","journal-title":"Nat. Genet."},{"key":"2023061007105161100_b15","doi-asserted-by":"crossref","first-page":"2088","DOI":"10.1093\/bioinformatics\/btg287","article-title":"A Bayesian missing value estimation method for gene expression profile data","volume":"19","author":"Oba","year":"2003","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b16","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1093\/bioinformatics\/bth007","article-title":"Gaussian mixture clustering and imputation of microarray data","volume":"20","author":"Ouyang","year":"2004","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b17","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1152\/physiolgenomics.00177.2003","article-title":"Screening anti-inflammatory compounds in injured spinal cord with microarrays: a comparison of bioinformatics analysis approaches","volume":"17","author":"Pan","year":"2004","journal-title":"Physiol. Genomics"},{"key":"2023061007105161100_b18","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1038\/ng1032","article-title":"Microarray data normalization and transformation","volume":"32","author":"Quackenbush","year":"2002","journal-title":"Nat. Genet."},{"key":"2023061007105161100_b19","doi-asserted-by":"crossref","first-page":"i255","DOI":"10.1093\/bioinformatics\/btg1036","article-title":"Using hidden Markov models to analyze gene expression time course data","volume":"19","author":"Schliep","year":"2003","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b20","doi-asserted-by":"crossref","first-page":"2417","DOI":"10.1093\/bioinformatics\/bti345","article-title":"Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data","volume":"21","author":"Sehgal","year":"2005","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b21","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2023061007105161100_b22","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/bioinformatics\/btg323","article-title":"Missing-value estimation using linear and non-linear regression with Bayesian gene selection","volume":"19","author":"Zhou","year":"2003","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/22\/4155\/50566171\/bioinformatics_21_22_4155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/22\/4155\/50566171\/bioinformatics_21_22_4155.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T19:11:47Z","timestamp":1735931507000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/22\/4155\/194305"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,8,23]]},"references-count":22,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2005,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti638","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,11,15]]},"published":{"date-parts":[[2005,8,23]]}}}