{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,12]],"date-time":"2025-01-12T18:10:04Z","timestamp":1736705404117,"version":"3.32.0"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for.<\/jats:p><jats:p>Results: For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose a clustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection. We apply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions.<\/jats:p><jats:p>Availability: R code is available upon request.<\/jats:p><jats:p>Contact: \u00a0shuangge.ma@yale.edu<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl632","type":"journal-article","created":{"date-parts":[[2006,12,21]],"date-time":"2006-12-21T01:34:06Z","timestamp":1166664846000},"page":"466-472","source":"Crossref","is-referenced-by-count":23,"title":["Clustering threshold gradient descent regularization: with applications to microarray studies"],"prefix":"10.1093","volume":"23","author":[{"given":"Shuangge","family":"Ma","sequence":"first","affiliation":[{"name":"Department of Epidemiology and Public Health, Yale University 1 \u00a0 1 \u00a0 \u00a0 New Haven, CT, USA"}]},{"given":"Jian","family":"Huang","sequence":"additional","affiliation":[{"name":"Departments of Statistics and Actuarial Science, University of Iowa 2 \u00a0 2 \u00a0 \u00a0 Iowa City, IA, USA"}]}],"member":"286","published-online":{"date-parts":[[2006,12,20]]},"reference":[{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad Patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/35000501","article-title":"Distinct types of diffuse large B-Cell lymphoma identified by gene expression profiling","volume":"403","author":"Alizadeh","year":"2000","journal-title":"Nature"},{"volume-title":"Classification and Regression Trees","year":"1984","author":"Breiman","key":"2023041109270286700_"},{"key":"2023041109270286700_","first-page":"511","article-title":"How well do we understand the clusters found in microarray data?","volume":"2","author":"Clare","year":"2002","journal-title":"In Silico Biol."},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables (with discussion)","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"2159","DOI":"10.1056\/NEJMoa041869","article-title":"Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells","volume":"351","author":"Dave","year":"2004","journal-title":"N. Engl. J. Med."},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1093\/bioinformatics\/btf867","article-title":"Boosting for tumor classification with gene expression data","volume":"9","author":"Dettling","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1198\/016214502753479248","article-title":"Comparison of discrimination methods for tumor classification based on microarray data","volume":"97","author":"Dudoit","year":"2002","journal-title":"JASA"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"2023041109270286700_","article-title":"Gradient directed regularization for linear regression and classification","volume-title":"Technical report","author":"Friedman","year":"2004"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"13784","DOI":"10.1073\/pnas.241500798","article-title":"Diversity of gene expression in adenocarcinoma of the lung","volume":"98","author":"Garber","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"volume-title":"Matrix Computations","year":"1996","author":"Golub","key":"2023041109270286700_"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","DOI":"10.1201\/9780367805302","volume-title":"Classification","author":"Gordon","year":"1999"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"3001","DOI":"10.1093\/bioinformatics\/bti422","article-title":"Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data","volume":"21","author":"Gui","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041109270286700_","first-page":"272","article-title":"Threshold gradient descent method for censored data regression with applications in pharmacogenomics","volume":"10","author":"Gui","year":"2005","journal-title":"Proc. Pac. Symp. Biocomput."},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1093\/nar\/gkh036","article-title":"The Gene Ontology (GO) database and informatics resource","volume":"32","author":"Harris","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1111\/j.1541-0420.2006.00562.x","article-title":"Regularized estimation in the accelerated failure time model with high dimensional covariates","volume":"62","author":"Huang","year":"2006","journal-title":"Biometrics"},{"volume-title":"Applied Multivariate Statistical Analysis","year":"2002","author":"Johnson","key":"2023041109270286700_"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"4356","DOI":"10.1093\/bioinformatics\/bti724","article-title":"Regularized ROC method for disease classification and biomarker selection with microarray data","volume":"21","author":"Ma","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1111\/j.1541-0420.2005.00405.x","article-title":"Additive risk models for survival data with high dimensional covariates","volume":"62","author":"Ma","year":"2006","journal-title":"Biometrics"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","DOI":"10.1002\/047172842X","volume-title":"Analyzing Microarray Gene Expression Data","author":"McLachlan","year":"2004"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1093\/bioinformatics\/18.1.39","article-title":"Tumor classification by partial least squares using microarray gene expression data","volume":"18","author":"Nguyen","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"3185","DOI":"10.1093\/bioinformatics\/bth383","article-title":"Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction","volume":"17","author":"Pochet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/S1535-6108(03)00028-X","article-title":"The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma","volume":"3","author":"Rosenwald","year":"2003","journal-title":"Cancer Cell"},{"year":"2001","author":"Spang","article-title":"Prediction and uncertainty in the analysis of gene expression profiles","key":"2023041109270286700_"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1038\/10343","article-title":"Systematic determination of genetic network architecture","volume":"22","author":"Tavazoie","year":"1999","journal-title":"Nat. Genet."},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","article-title":"Interpreting patterns of gene expression with self-organizing maps: methods and applications to hematopoietic differentiation","volume":"96","author":"Tamayo","year":"1999","journal-title":"Porc. Natl Acad. Sci. USA"},{"key":"2023041109270286700_","article-title":"Clustering methods for the analysis of DNA microarray data","volume-title":"Manuscript.","author":"Tibshirani","year":"1999"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","DOI":"10.1137\/1.9781611970128","volume-title":"Spline models for observational data","author":"Wahba","year":"1990"},{"key":"2023041109270286700_","article-title":"Nonparametric pathway-based regression models for analysis of genomic data","volume-title":"University of Pennsylvania Biostatistics Working Papers, Year 2006","author":"Wei","year":"2006"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"11562","DOI":"10.1073\/pnas.201162998","article-title":"Predicting the clinical status of human breast cancer by using gene expression profiles","volume":"98","author":"West","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041109270286700_","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/17.4.309","article-title":"Validating clustering for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/4\/466\/49829784\/bioinformatics_23_4_466.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/4\/466\/49829784\/bioinformatics_23_4_466.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,12]],"date-time":"2025-01-12T17:44:22Z","timestamp":1736703862000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/4\/466\/181747"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,12,20]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2007,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl632","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2007,2,15]]},"published":{"date-parts":[[2006,12,20]]}}}