{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:24:16Z","timestamp":1767961456978,"version":"3.49.0"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal.<\/jats:p><jats:p>Results: With unbalanced data, permutation tests may not be suitable because they do not test the hypothesis of interest. In addition, permutation tests can be biased. Using biased P-values to estimate the FDR can produce unacceptable bias in those estimates. Results also show that the approach of pooling permutation null distributions across genes can produce invalid P-values, since even non-DE genes can have different permutation null distributions. We encourage researchers to use statistics that have been shown to reliably discriminate DE genes, but caution that associated P-values may be either invalid, or a less-effective metric for discriminating DE genes.<\/jats:p><jats:p>Contact: \u00a0katiek@u.washington.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp363","type":"journal-article","created":{"date-parts":[[2009,6,16]],"date-time":"2009-06-16T00:26:03Z","timestamp":1245111963000},"page":"2035-2041","source":"Crossref","is-referenced-by-count":49,"title":["Comments on the analysis of unbalanced microarray data"],"prefix":"10.1093","volume":"25","author":[{"given":"Kathleen F.","family":"Kerr","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Box 357232, University of Washington, Seattle, WA 98195, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,6,15]]},"reference":[{"key":"2023013112093444400_B1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0167-9473(01)00046-9","article-title":"A mixture model approach for the analysis of microarray gene expression data","volume":"39","author":"Allison","year":"2002","journal-title":"Comput. Stat. Data Anal."},{"key":"2023013112093444400_B2","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nrg1749","article-title":"Microarray data analysis: from disarray to consolidation and consensus","volume":"7","author":"Allison","year":"2006","journal-title":"Nat. Rev. Genet."},{"key":"2023013112093444400_B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate\u2014a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B Methodol."},{"key":"2023013112093444400_B4","doi-asserted-by":"crossref","first-page":"756","DOI":"10.1002\/bimj.200710471","article-title":"Partitioning to uncover conditions for permutation tests to control multiple testing error rates","volume":"50","author":"Calian","year":"2008","journal-title":"Biom. J."},{"key":"2023013112093444400_B5","doi-asserted-by":"crossref","first-page":"36","DOI":"10.2202\/1544-6115.1064","article-title":"Statistical significance threshold criteria for analysis of microarray gene expression data","volume":"3","author":"Cheng","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112093444400_B6","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1093\/biostatistics\/kxh018","article-title":"Improved statistical tests for differential gene expression by shrinking variance components estimates","volume":"6","author":"Cui","year":"2005","journal-title":"Biostatistics"},{"key":"2023013112093444400_B7","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/bti063","article-title":"A simple procedure for estimating the false discovery rate","volume":"21","author":"Dalmasso","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B8","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1111\/j.1467-9876.2005.05593.x","article-title":"A Bayesian mixture model for differential gene expression","volume":"54","author":"Do","year":"2005","journal-title":"J. R. Stat. Soc. C Appl. Stat."},{"key":"2023013112093444400_B9","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1214\/ss\/1056397487","article-title":"Multiple hypothesis testing in microarray experiments","volume":"18","author":"Dudoit","year":"2003","journal-title":"Stat. Sci."},{"key":"2023013112093444400_B10","first-page":"1351","article-title":"Size, power and false discovery rates","volume":"4","author":"Efron","year":"2007","journal-title":"Ann. Stat."},{"key":"2023013112093444400_B11","doi-asserted-by":"crossref","first-page":"R69","DOI":"10.1186\/gb-2007-8-5-r69","article-title":"Towards the uniform distribution of null p values on affymetrix microarrays","volume":"8","author":"Fodor","year":"2007","journal-title":"Genome Biol."},{"key":"2023013112093444400_B12","doi-asserted-by":"crossref","first-page":"1486","DOI":"10.1093\/bioinformatics\/btl109","article-title":"Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments","volume":"22","author":"Gao","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/BF02595811","article-title":"Resampling-based multiple testing for microarray data analysis","volume":"12","author":"Ge","year":"2003","journal-title":"Test"},{"key":"2023013112093444400_B14","doi-asserted-by":"crossref","first-page":"R80","DOI":"10.1186\/gb-2004-5-10-r80","article-title":"Bioconductor: open software development for computational biology and bioinformatics","volume":"5","author":"Gentleman","year":"2004","journal-title":"Genome Biol."},{"key":"2023013112093444400_B15","doi-asserted-by":"crossref","first-page":"14","DOI":"10.2202\/1544-6115.1285","article-title":"Inference on the limiting false discovery rate and the p-value threshold parameter assuming weak dependence between gene expression levels within subject","volume":"6","author":"Heller","year":"2007","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112093444400_B16","doi-asserted-by":"crossref","first-page":"i390","DOI":"10.1093\/bioinformatics\/btn142","article-title":"Differential variability analysis of gene expression and its application to human diseases","volume":"24","author":"Ho","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B17","doi-asserted-by":"crossref","first-page":"2244","DOI":"10.1093\/bioinformatics\/btl383","article-title":"To permute or not to permute","volume":"22","author":"Huang","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B18","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1111\/j.1467-9868.2005.00515.x","article-title":"Estimating the proportion of true null hypotheses, with application to dna microarray data","volume":"67","author":"Langaas","year":"2005","journal-title":"J. R. Stat. Soc. B Stat. Methodol."},{"key":"2023013112093444400_B19","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by \u201csurrogate variable analysis\u201d","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet."},{"key":"2023013112093444400_B20","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/j.1467-9868.2005.00509.x","article-title":"Variance of the number of false discoveries","volume":"67","author":"Owen","year":"2005","journal-title":"J. R. Stat. Soc. B Stat. Methodol."},{"key":"2023013112093444400_B21","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/1471-2105-7-84","article-title":"The powerAtlas: a power and sample size atlas for microarray experimental design and research","volume":"7","author":"Page","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013112093444400_B22","doi-asserted-by":"crossref","first-page":"3025","DOI":"10.1093\/bioinformatics\/btl527","article-title":"Estimation of false discovery proportion under general dependence","volume":"22","author":"Pawitan","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B23","first-page":"3","article-title":"Multiple testing for gene expression data: an investigation of null distributions with consequences for the permutation test","author":"Pollard","year":"2003","journal-title":"Proceedings of the 2003 International MultiConference in Computer Science and Engineering"},{"key":"2023013112093444400_B24","doi-asserted-by":"crossref","first-page":"1737","DOI":"10.1093\/bioinformatics\/bth160","article-title":"Improving false discovery rate estimation","volume":"20","author":"Pounds","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B25","doi-asserted-by":"crossref","first-page":"1979","DOI":"10.1093\/bioinformatics\/btl328","article-title":"Robust estimation of the false discovery rate","volume":"22","author":"Pounds","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B26","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1093\/bioinformatics\/btg148","article-title":"Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values","volume":"19","author":"Pounds","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B27","doi-asserted-by":"crossref","first-page":"5471","DOI":"10.1093\/nar\/gkh866","article-title":"Empirical evaluation of data transformations and ranking statistics for microarray analysis","volume":"32","author":"Qin","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023013112093444400_B28","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1142\/S0219720006002338","article-title":"Some comments on instability of false discovery rate estimation","volume":"4","author":"Qiu","year":"2006","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023013112093444400_B29","article-title":"Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes","volume":"4","author":"Qiu","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112093444400_B30","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1186\/1471-2105-6-120","article-title":"The effects of normalization on the correlation structure of microarray data","volume":"6","author":"Qiu","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013112093444400_B31","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1111\/j.1541-0420.2006.00704.x","article-title":"Exploring the information in p-values for the analysis and planning of multiple-test experiments","volume":"63","author":"Ruppert","year":"2007","journal-title":"Biometrics"},{"key":"2023013112093444400_B32","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1027","article-title":"Linear models and empirical Bayes methods for assessing differential expression in microarray experiments","volume":"3","author":"Smyth","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112093444400_B33","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","article-title":"Limma: linear models for microarray data","volume-title":"Bioinformatics and Computational Biology Solutions using R and Bioconductor","author":"Smyth","year":"2005"},{"key":"2023013112093444400_B34","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1111\/1467-9868.00346","article-title":"A direct approach to false discovery rates","volume":"64","author":"Storey","year":"2002","journal-title":"J. R. Stat. Soc. B Stat. Methodol."},{"key":"2023013112093444400_B35","doi-asserted-by":"crossref","first-page":"9440","DOI":"10.1073\/pnas.1530509100","article-title":"Statistical significance for genomewide studies","volume":"100","author":"Storey","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112093444400_B36","doi-asserted-by":"crossref","first-page":"5116","DOI":"10.1073\/pnas.091062498","article-title":"Significance analysis of microarrays applied to the ionizing radiation response","volume":"98","author":"Tusher","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112093444400_B37","doi-asserted-by":"crossref","first-page":"4280","DOI":"10.1093\/bioinformatics\/bti685","article-title":"A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data","volume":"21","author":"Xie","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112093444400_B38","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1002\/bimj.200610307","article-title":"Applying the generalized partitioning principle to control the generalized familywise error rate","volume":"49","author":"Xu","year":"2007","journal-title":"Biom. J."},{"key":"2023013112093444400_B39","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1093\/bioinformatics\/btl548","article-title":"Estimating p-values in small microarray experiments","volume":"23","author":"Yang","year":"2007","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/16\/2035\/48993402\/bioinformatics_25_16_2035.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/16\/2035\/48993402\/bioinformatics_25_16_2035.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T04:42:19Z","timestamp":1739162539000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/16\/2035\/205153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6,15]]},"references-count":39,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2009,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp363","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,8,15]]},"published":{"date-parts":[[2009,6,15]]}}}