{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T13:31:22Z","timestamp":1760707882713,"version":"3.33.0"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"23","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Discriminant analysis for high-dimensional and low-sample-sized data has become a hot research topic in bioinformatics, mainly motivated by its importance and challenge in applications to tumor classifications for high-dimensional microarray data. Two of the popular methods are the nearest shrunken centroids, also called predictive analysis of microarray (PAM), and shrunken centroids regularized discriminant analysis (SCRDA). Both methods are modifications to the classic linear discriminant analysis (LDA) in two aspects tailored to high-dimensional and low-sample-sized data: one is the regularization of the covariance matrix, and the other is variable selection through shrinkage. In spite of their usefulness, there are potential limitations with each method. The main concern is that both PAM and SCRDA are possibly too extreme: the covariance matrix in the former is restricted to be diagonal while in the latter there is barely any restriction. Based on the biology of gene functions and given the feature of the data, it may be beneficial to estimate the covariance matrix as an intermediate between the two; furthermore, more effective shrinkage schemes may be possible.<\/jats:p><jats:p>Results: We propose modified LDA methods to integrate biological knowledge of gene functions (or variable groups) into classification of microarray data. Instead of simply treating all the genes independently or imposing no restriction on the correlations among the genes, we group the genes according to their biological functions extracted from existing biological knowledge or data, and propose regularized covariance estimators that encourages between-group gene independence and within-group gene correlations while maintaining the flexibility of any general covariance structure. Furthermore, we propose a shrinkage scheme on groups of genes that tends to retain or remove a whole group of the genes altogether, in contrast to the standard shrinkage on individual genes. We show that one of the proposed methods performed better than PAM and SCRDA in a simulation study and several real data examples.<\/jats:p><jats:p>Contact: \u00a0weip@biostat.umn.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm488","type":"journal-article","created":{"date-parts":[[2007,10,13]],"date-time":"2007-10-13T00:33:48Z","timestamp":1192235628000},"page":"3170-3177","source":"Crossref","is-referenced-by-count":44,"title":["Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data"],"prefix":"10.1093","volume":"23","author":[{"given":"Feng","family":"Tai","sequence":"first","affiliation":[{"name":"Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA"}]},{"given":"Wei","family":"Pan","sequence":"additional","affiliation":[{"name":"Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,10,12]]},"reference":[{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/ng765","article-title":"MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia\u2019","volume":"30","author":"Armstrong","year":"2001","journal-title":"Nat. Genet"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"898","DOI":"10.1214\/aos\/1018031262","article-title":"Adaptive wavelet estimation: a block thresholding and oracle inequality approach","volume":"27","author":"Cai","year":"1999","journal-title":"Ann. Stat"},{"key":"2023041107520902300_","first-page":"4963","article-title":"Translation of microarray data into clinically relevant cancer diagnostic tests using gege expression ratios in lung cancer and mesothelioma","volume":"62","author":"Gordon","year":"2002","journal-title":"Cancer Res"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"3001","DOI":"10.1093\/bioinformatics\/bti422","article-title":"Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data","volume":"21","author":"Gui","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1093\/biostatistics\/kxj035","article-title":"Regularized linear discriminant analysis and its application in microarrays","volume":"8","author":"Guo","year":"2007","journal-title":"Biostatistics"},{"volume-title":"The Elements of Statistical Learning. Data mining, Inference, and Prediction","year":"2001","author":"Hastie","key":"2023041107520902300_"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"2072","DOI":"10.1093\/bioinformatics\/btg283","article-title":"Linear regression and two-class classification with gene expression data","volume":"19","author":"Huang","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"1590","DOI":"10.1016\/S0140-6736(03)13308-9","article-title":"Gene expression predictors of breast cancer outcomes","volume":"361","author":"Huang","year":"2003","journal-title":"Lancet"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"1259","DOI":"10.1093\/bioinformatics\/btl065","article-title":"Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data","volume":"22","author":"Huang","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","first-page":"34","article-title":"Toward pathway engineering: a new database of genetic and molecular pathway","volume":"59","author":"Kanehisa","year":"1996","journal-title":"Sci. Tech. Jpn"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"1971","DOI":"10.1093\/bioinformatics\/bti292","article-title":"Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data","volume":"21","author":"Lottaz","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1124","article-title":"Incorporating biological information as a prior in an empirical Bayes approach to analyzing microarray data","volume":"4","author":"Pan","year":"2005","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1093\/bioinformatics\/btl011","article-title":"Incorporating gene functions as priors in model-based clustering of microarray gene expression data","volume":"22","author":"Pan","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"2028","DOI":"10.1093\/bioinformatics\/btl344","article-title":"Pathway analysis using random forests classification and regression","volume":"22","author":"Pang","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","article-title":"Gene expression correlates of clinical prostate cancer behavior","volume":"1","author":"Singh","year":"2002","journal-title":"Cancer Cell"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"123","DOI":"10.14490\/jjss.37.123","article-title":"Comparison of discrimination methods for high dimensional data","volume":"37","author":"Srivastava","year":"2007","journal-title":"J. Jpn. Stat. Soc"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"1775","DOI":"10.1093\/bioinformatics\/btm234","article-title":"Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms","volume":"23","author":"Tai","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","article-title":"Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data","author":"Tai","year":"2007","journal-title":"Research report 2008\u2013020"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the LASSO","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc.,B"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1214\/ss\/1056397488","article-title":"Class prediction by nearest shrunken centroids with applications to DNA Microarrays","volume":"18","author":"Tibshirani","year":"2003","journal-title":"Stat. Sci"},{"volume-title":"Statistical Learning Theory","year":"1998","author":"Vapnik","key":"2023041107520902300_"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"972","DOI":"10.1093\/bioinformatics\/btm046","article-title":"Improved centroids estimation for the nearest shrunken centroid classifier","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1016\/S0140-6736(05)17947-1","article-title":"Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer","volume":"365","author":"Wang","year":"2005","journal-title":"Lancet"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1093\/biostatistics\/kxl007","article-title":"Nonparametric pathway-based regression models for analysis of genomic data","volume":"8","author":"Wei","year":"2007","journal-title":"Biostatistics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1093\/bioinformatics\/bti827","article-title":"Differential gene expression detection and sample classification using penalized linear regression models","volume":"22","author":"Wu","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041107520902300_","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J. R. Stat. Soc. B"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/23\/3170\/49824250\/bioinformatics_23_23_3170.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/23\/3170\/49824250\/bioinformatics_23_23_3170.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,21]],"date-time":"2025-01-21T16:59:19Z","timestamp":1737478759000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/23\/3170\/289528"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,10,12]]},"references-count":27,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2007,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm488","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2007,12,1]]},"published":{"date-parts":[[2007,10,12]]}}}