{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T11:24:38Z","timestamp":1771673078549,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1805,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Penalized regression methods have been adopted widely for high-dimensional feature selection and prediction in many bioinformatic and biostatistical contexts. While their theoretical properties are well-understood, specific methodology for their optimal application to genomic data has not been determined.<\/jats:p><jats:p>Results: Through simulation of contrasting scenarios of correlated high-dimensional survival data, we compared the LASSO, Ridge and Elastic Net penalties for prediction and variable selection. We found that a 2D tuning of the Elastic Net penalties was necessary to avoid mimicking the performance of LASSO or Ridge regression. Furthermore, we found that in a simulated scenario favoring the LASSO penalty, a univariate pre-filter made the Elastic Net behave more like Ridge regression, which was detrimental to prediction performance. We demonstrate the real-life application of these methods to predicting the survival of cancer patients from microarray data, and to classification of obese and lean individuals from metagenomic data. Based on these results, we provide an optimized set of guidelines for the application of penalized regression for reproducible class comparison and prediction with genomic data.<\/jats:p><jats:p>Availability and Implementation: A parallelized implementation of the methods presented for regression and for simulation of synthetic data is provided as the pensim R package, available at http:\/\/cran.r-project.org\/web\/packages\/pensim\/index.html.<\/jats:p><jats:p>Contact: \u00a0chuttenh@hsph.harvard.edu; juris@ai.utoronto.ca<\/jats:p><jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr591","type":"journal-article","created":{"date-parts":[[2011,12,7]],"date-time":"2011-12-07T08:06:45Z","timestamp":1323245205000},"page":"3399-3406","source":"Crossref","is-referenced-by-count":74,"title":["Optimized application of penalized regression methods to diverse genomic data"],"prefix":"10.1093","volume":"27","author":[{"given":"Levi","family":"Waldron","sequence":"first","affiliation":[{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"},{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"}]},{"given":"Melania","family":"Pintilie","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"}]},{"given":"Ming-Sound","family":"Tsao","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"}]},{"given":"Frances A.","family":"Shepherd","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"}]},{"given":"Curtis","family":"Huttenhower","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"}]},{"given":"Igor","family":"Jurisica","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA, 2Department of Biostatistics, Ontario Cancer Institute, University Health Network and 3Ontario Cancer Institute, PMH\/UHN, Campbell Family Institute for Cancer Research, Toronto, ON, Canada"}]}],"member":"286","published-online":{"date-parts":[[2011,10,24]]},"reference":[{"key":"2023012511300549000_B1","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/nm733","article-title":"Gene-expression profiles predict survival of patients with lung adenocarcinoma","volume":"8","author":"Beer","year":"2002","journal-title":"Nat. Med."},{"key":"2023012511300549000_B2","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1226","article-title":"Reader's reaction to \u201cDimension reduction for classification with gene expression microarray data\u201d by Dai et al (2006)","volume":"5","author":"Boulesteix","year":"2006","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012511300549000_B3","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1093\/bioinformatics\/btm305","article-title":"Predicting survival from microarray data - a comparative study","volume":"23","author":"B\u00f8velstad","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012511300549000_B4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012511300549000_B5","first-page":"477","article-title":"Boosting algorithms: regularization, prediction and model fitting","volume":"22","author":"B\u00fchlmann","year":"2007","journal-title":"Stat. Sci."},{"key":"2023012511300549000_B6","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1137\/0916069","article-title":"A limited memory algorithm for bound constrained optimization","volume":"16","author":"Byrd","year":"1995","journal-title":"SIAM J. Sci. Comput."},{"key":"2023012511300549000_B7","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1056\/NEJMoa060096","article-title":"A five-gene signature and clinical outcome in non\u2013small-cell lung cancer","volume":"356","author":"Chen","year":"2007","journal-title":"N. Engl. J. Med."},{"key":"2023012511300549000_B8","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012511300549000_B9","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1111\/j.1467-9868.2008.00674.x","article-title":"Sure independence screening for ultrahigh dimensional feature space","volume":"70","author":"Fan","year":"2008","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012511300549000_B10","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1002\/bimj.200900028","article-title":"L1 penalized estimation in the Cox proportional hazards model","volume":"52","author":"Goeman","year":"2010","journal-title":"Biometr. J. Biometri. Zeitsch."},{"key":"2023012511300549000_B11","doi-asserted-by":"crossref","first-page":"3001","DOI":"10.1093\/bioinformatics\/bti422","article-title":"Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data","volume":"21","author":"Gui","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511300549000_B12","first-page":"61","article-title":"Model selection: beyond the Bayesian\/frequentist divide","volume":"11","author":"Guyon","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"2023012511300549000_B13","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-3462-1","volume-title":"Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis.","author":"Harrell","year":"2001"},{"key":"2023012511300549000_B14","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1002\/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4","article-title":"Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors","volume":"15","author":"Harrell","year":"1996","journal-title":"Stat. Med."},{"key":"2023012511300549000_B15","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1214\/08-SS035","article-title":"Least angle and \u21131 penalized regression: a review","volume":"2","author":"Hesterberg","year":"2008","journal-title":"Stat. Surv."},{"key":"2023012511300549000_B16","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","article-title":"Ridge regression: biased estimation for nonorthogonal problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"2023012511300549000_B17","volume-title":"Applied survival analysis: regression modeling of time to event data.","author":"Hosmer","year":"1999"},{"key":"2023012511300549000_B18","doi-asserted-by":"crossref","first-page":"1990","DOI":"10.1093\/bioinformatics\/btq323","article-title":"Over-optimism in bioinformatics: an illustration","volume":"26","author":"Jelizarow","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012511300549000_B19","doi-asserted-by":"crossref","first-page":"1022","DOI":"10.1038\/4441022a","article-title":"Microbial ecology: human gut microbes associated with obesity","volume":"444","author":"Ley","year":"2006","journal-title":"Nature"},{"key":"2023012511300549000_B20","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1016\/S0140-6736(05)17866-0","article-title":"Prediction of cancer outcome with microarrays: a multiple random validation strategy","volume":"365","author":"Michiels","year":"2005","journal-title":"Lancet"},{"key":"2023012511300549000_B21","doi-asserted-by":"crossref","first-page":"3301","DOI":"10.1093\/bioinformatics\/bti499","article-title":"Prediction error estimation: a comparison of resampling methods","volume":"21","author":"Molinaro","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511300549000_B22","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2023012511300549000_B23","volume-title":"R: A Language and Environment for Statistical Computing.","author":"R Development Core Team","year":"2010"},{"key":"2023012511300549000_B24","doi-asserted-by":"crossref","first-page":"331","DOI":"10.3816\/CCC.2008.n.044","article-title":"Systemic inflammatory response predicts prognosis in patients with advanced-stage colorectal cancer","volume":"7","author":"Sharma","year":"2008","journal-title":"Clin. Colorectal Cancer"},{"key":"2023012511300549000_B25","doi-asserted-by":"crossref","first-page":"822","DOI":"10.1038\/nm.1790","article-title":"Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study","volume":"14","author":"Shedden","year":"2008","journal-title":"Nat. Med."},{"key":"2023012511300549000_B26","volume-title":"Design and analysis of DNA microarray investigations.","author":"Simon","year":"2003"},{"key":"2023012511300549000_B27","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1093\/bib\/bbr001","article-title":"Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data","volume":"12","author":"Simon","year":"2011","journal-title":"Brief. Bioinformatics"},{"key":"2023012511300549000_B28","doi-asserted-by":"crossref","first-page":"10869","DOI":"10.1073\/pnas.191367098","article-title":"Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications","volume":"98","author":"S\u00f8rlie","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511300549000_B29","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511300549000_B30","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the Lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012511300549000_B31","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3","article-title":"The lasso method for variable selection in the Cox model","volume":"16","author":"Tibshirani","year":"1997","journal-title":"Stat. Med."},{"key":"2023012511300549000_B32","doi-asserted-by":"crossref","first-page":"21","DOI":"10.2202\/1544-6115.1438","article-title":"Univariate shrinkage in the Cox model for high dimensional data","volume":"8","author":"Tibshirani","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012511300549000_B33","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1038\/nature05414","article-title":"An obesity-associated gut microbiome with increased capacity for energy harvest","volume":"444","author":"Turnbaugh","year":"2006","journal-title":"Nature"},{"key":"2023012511300549000_B34","doi-asserted-by":"crossref","first-page":"1999","DOI":"10.1056\/NEJMoa021967","article-title":"A gene-expression signature as a predictor of survival in breast cancer","volume":"347","author":"van de Vijver","year":"2002","journal-title":"N. Engl. J. Med."},{"key":"2023012511300549000_B35","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21706-2","volume-title":"Modern Applied Statistics with S.","author":"Venables","year":"2002"},{"key":"2023012511300549000_B36","doi-asserted-by":"crossref","first-page":"2305","DOI":"10.1002\/sim.4780122407","article-title":"Cross-validation in survival analysis","volume":"12","author":"Verweij","year":"1993","journal-title":"Stat. Med."},{"key":"2023012511300549000_B37","doi-asserted-by":"crossref","first-page":"2427","DOI":"10.1002\/sim.4780132307","article-title":"Penalized likelihood in Cox regression","volume":"13","author":"Verweij","year":"1994","journal-title":"Stat. Med."},{"key":"2023012511300549000_B38","first-page":"3005","article-title":"Molecular profiling of non-small cell lung cancer and correlation with disease-free survival","volume":"62","author":"Wigle","year":"2002","journal-title":"Cancer Res."},{"key":"2023012511300549000_B39","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012511300549000_B40","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/24\/3399\/48861166\/bioinformatics_27_24_3399.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/24\/3399\/48861166\/bioinformatics_27_24_3399.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,15]],"date-time":"2025-03-15T01:41:18Z","timestamp":1742002878000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/24\/3399\/306905"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10,24]]},"references-count":40,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2011,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr591","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,12,15]]},"published":{"date-parts":[[2011,10,24]]}}}