{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T04:17:22Z","timestamp":1774412242985,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Biomarker discovery and gene ranking is a standard task in genomic high-throughput analysis. Typically, the ordering of markers is based on a stabilized variant of the t-score, such as the moderated t or the SAM statistic. However, these procedures ignore gene\u2013gene correlations, which may have a profound impact on the gene orderings and on the power of the subsequent tests.<\/jats:p>\n               <jats:p>Results: We propose a simple procedure that adjusts gene-wise t-statistics to take account of correlations among genes. The resulting correlation-adjusted t-scores (\u2018cat\u2019 scores) are derived from a predictive perspective, i.e. as a score for variable selection to discriminate group membership in two-class linear discriminant analysis. In the absence of correlation the cat score reduces to the standard t-score. Moreover, using the cat score it is straightforward to evaluate groups of features (i.e. gene sets). For computation of the cat score from small sample data, we propose a shrinkage procedure. In a comparative study comprising six different synthetic and empirical correlation structures, we show that the cat score improves estimation of gene orderings and leads to higher power for fixed true discovery rate, and vice versa. Finally, we also illustrate the cat score by analyzing metabolomic data.<\/jats:p>\n               <jats:p>Availability: The shrinkage cat score is implemented in the R package \u2018st\u2019, which is freely available under the terms of the GNU General Public License (version 3 or later) from CRAN (http:\/\/cran.r-project.org\/web\/packages\/st\/).<\/jats:p>\n               <jats:p>Contact: \u00a0strimmer@uni-leipzig.de<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp460","type":"journal-article","created":{"date-parts":[[2009,8,1]],"date-time":"2009-08-01T00:14:58Z","timestamp":1249085698000},"page":"2700-2707","source":"Crossref","is-referenced-by-count":77,"title":["Gene ranking and biomarker discovery under correlation"],"prefix":"10.1093","volume":"25","author":[{"given":"Verena","family":"Zuber","sequence":"first","affiliation":[{"name":"Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, H\u00e4rtelstr. 16-18, 04107 Leipzig, Germany"}]},{"given":"Korbinian","family":"Strimmer","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, H\u00e4rtelstr. 16-18, 04107 Leipzig, Germany"}]}],"member":"286","published-online":{"date-parts":[[2009,7,30]]},"reference":[{"key":"2023013112125764900_B1","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/1471-2105-10-47","article-title":"A general modular framework for gene set enrichment","volume":"10","author":"Ackermann","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023013112125764900_B2","article-title":"Feature selection in omics prediction problems using cat scores and false non-discovery rate control","author":"Ahdesm\u00e4ki","year":"2009","journal-title":"Ann. Appl. Stat."},{"key":"2023013112125764900_B3","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112125764900_B4","doi-asserted-by":"crossref","first-page":"989","DOI":"10.3150\/bj\/1106314847","article-title":"Some theory for Fisher's linear discriminant function, \u2018naive Bayes\u2019, and some alternatives when there are many more variables than observations","volume":"10","author":"Bickel","year":"2004","journal-title":"Bernoulli"},{"key":"2023013112125764900_B5","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1111\/j.1541-0420.2007.00843.x","article-title":"Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR","volume":"64","author":"Bondell","year":"2008","journal-title":"Biometrics"},{"key":"2023013112125764900_B6","doi-asserted-by":"crossref","first-page":"R16","DOI":"10.1186\/gb-2005-6-2-r16","article-title":"Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control data set","volume":"6","author":"Choe","year":"2005","journal-title":"Genome Biology"},{"key":"2023013112125764900_B7","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1198\/016214506000001211","article-title":"Correlation and large-scale simultaneous significance testing","volume":"102","author":"Efron","year":"2007","journal-title":"J. Am. Stat. Assoc."},{"key":"2023013112125764900_B8","first-page":"1","article-title":"Microarrays, empirical Bayes, and the two-groups model","volume":"23","author":"Efron","year":"2008","journal-title":"Stat. Sci."},{"key":"2023013112125764900_B9","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1214\/07-AOS504","article-title":"High-dimensional classification using features annealed independence rules","volume":"36","author":"Fan","year":"2008","journal-title":"Ann. Stat."},{"key":"2023013112125764900_B10","doi-asserted-by":"crossref","DOI":"10.1109\/BIBMW.2008.4686237","article-title":"Graph-constrained discriminant analysis of functional genomics data","volume-title":"IEEE International Conference on Bioinformatics and Biomedicine","author":"Guillemot","year":"2008"},{"key":"2023013112125764900_B11","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1093\/biostatistics\/kxj035","article-title":"Regularized discriminant analysis and its application in microarrays","volume":"8","author":"Guo","year":"2007","journal-title":"Biostatistics"},{"key":"2023013112125764900_B12","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1111\/j.1467-9868.2005.00510.x","article-title":"Geometric representation of high dimension, low sample size data","volume":"67","author":"Hall","year":"2005","journal-title":"J. R. Stat. Soc. B"},{"key":"2023013112125764900_B13","first-page":"1","article-title":"Classifier technology and the illusion of progress","volume":"21","author":"Hand","year":"2006","journal-title":"Stat. Sci."},{"key":"2023013112125764900_B14","doi-asserted-by":"crossref","first-page":"15","DOI":"10.2202\/1544-6115.1435","article-title":"Breast cancer diagnosis from proteomic mass spectrometry data: a comparative evaluation","volume":"7","author":"Hand","year":"2008","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112125764900_B15","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1056\/NEJM200102223440801","article-title":"Gene-expression profiles in hereditary breast cancer","volume":"344","author":"Hedenfalk","year":"2001","journal-title":"N. Engl. J. Med."},{"key":"2023013112125764900_B16","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1215\/S0012-7094-48-01568-3","article-title":"The central limit theorem for dependent random variables","volume":"15","author":"Hoeffding","year":"1948","journal-title":"Duke Math. J."},{"key":"2023013112125764900_B17","doi-asserted-by":"crossref","first-page":"2373","DOI":"10.1093\/bioinformatics\/btl401","article-title":"A multivariate approach for integrating genome-wide expression data and biological knowledge","volume":"22","author":"Kong","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112125764900_B18","doi-asserted-by":"crossref","first-page":"666","DOI":"10.1093\/bioinformatics\/btm507","article-title":"Genome-wide co-expression based prediction of differential expression","volume":"24","author":"Lai","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013112125764900_B19","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1002\/bimj.200800207","article-title":"High-dimensional data analysis: selection of variables, data compression and graphics \u2014 applications to gene expression","volume":"51","author":"L\u00e4uter","year":"2009","journal-title":"Biometr. J."},{"key":"2023013112125764900_B20","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1093\/bioinformatics\/btn081","article-title":"Network-constrained regularization and variable selection for analysis of genomic data","volume":"24","author":"Li","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013112125764900_B21","doi-asserted-by":"crossref","first-page":"3105","DOI":"10.1093\/bioinformatics\/bti496","article-title":"Hotelling's T2multivariate profiling for detecting differential expression in microarrays","volume":"21","author":"Lu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112125764900_B22","doi-asserted-by":"crossref","first-page":"765","DOI":"10.1093\/bioinformatics\/btp053","article-title":"Testing significance relative to fold-change threshold is a TREAT","volume":"25","author":"McCarthy","year":"2009","journal-title":"Bioinformatics"},{"key":"2023013112125764900_B23","doi-asserted-by":"crossref","first-page":"9","DOI":"10.2202\/1544-6115.1252","article-title":"Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach","volume":"6","author":"Opgen-Rhein","year":"2007","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112125764900_B24","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/S0167-7152(99)00146-7","article-title":"A more general central limit theorem for m-dependent random variables with unbounded m","volume":"47","author":"Romano","year":"2000","journal-title":"Stat. Probab. Lett."},{"key":"2023013112125764900_B25","doi-asserted-by":"crossref","first-page":"32","DOI":"10.2202\/1544-6115.1175","article-title":"A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics","volume":"4","author":"Sch\u00e4fer","year":"2005","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112125764900_B26","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1093\/biostatistics\/kxm047","article-title":"Significance levels for studies with correlated test statistics","volume":"9","author":"Shi","year":"2008","journal-title":"Biostatistics"},{"key":"2023013112125764900_B27","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2202\/1544-6115.1027","article-title":"Linear models and empirical Bayes methods for assessing differential expression in microarray experiments","volume":"3","author":"Smyth","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023013112125764900_B28","doi-asserted-by":"crossref","first-page":"910","DOI":"10.1038\/nature07762","article-title":"Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression","volume":"457","author":"Sreekumar","year":"2009","journal-title":"Nature"},{"key":"2023013112125764900_B29","doi-asserted-by":"crossref","first-page":"1461","DOI":"10.1093\/bioinformatics\/btn209","article-title":"fdrtool: a versatile R package for estimating local and tail area-based false discovery rates","volume":"24","author":"Strimmer","year":"2008","journal-title":"Bionformatics"},{"key":"2023013112125764900_B30","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1186\/1471-2105-9-303","article-title":"A unified approach to false discovery rate estimation","volume":"9","author":"Strimmer","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013112125764900_B31","doi-asserted-by":"crossref","first-page":"3170","DOI":"10.1093\/bioinformatics\/btm488","article-title":"Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data","volume":"23","author":"Tai","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112125764900_B32","author":"Tibshirani","year":"2006","journal-title":"Correlation-sharing for detection of differential gene expression"},{"key":"2023013112125764900_B33","doi-asserted-by":"crossref","first-page":"5116","DOI":"10.1073\/pnas.091062498","article-title":"Significance analysis of microarrays applied to the ionizing radiation response","volume":"98","author":"Tusher","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/20\/2700\/48993309\/bioinformatics_25_20_2700.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/20\/2700\/48993309\/bioinformatics_25_20_2700.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:40:03Z","timestamp":1675201203000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/20\/2700\/193240"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,7,30]]},"references-count":33,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2009,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp460","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,10,15]]},"published":{"date-parts":[[2009,7,30]]}}}