{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T21:52:33Z","timestamp":1775339553551,"version":"3.50.1"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2021,2,22]],"date-time":"2021-02-22T00:00:00Z","timestamp":1613952000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100003032","name":"Association Nationale Recherche Technologie","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003032","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8,25]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availabilityand implementation<\/jats:title><jats:p>Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN).<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab114","type":"journal-article","created":{"date-parts":[[2021,2,18]],"date-time":"2021-02-18T21:06:31Z","timestamp":1613682391000},"page":"2238-2244","source":"Crossref","is-referenced-by-count":12,"title":["A variable selection approach for highly correlated predictors in high-dimensional genomic data"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1234-9467","authenticated-orcid":false,"given":"Wencan","family":"Zhu","sequence":"first","affiliation":[{"name":"UMR MIA-Paris, AgroParisTech, INRAE \u2013 Universit\u00e9 Paris-Saclay , Paris 75005, France"},{"name":"Biostatistics and Programming Department, Sanofi R&D , Chilly Mazarin 91380, France"}]},{"given":"C\u00e9line","family":"L\u00e9vy-Leduc","sequence":"additional","affiliation":[{"name":"UMR MIA-Paris, AgroParisTech, INRAE \u2013 Universit\u00e9 Paris-Saclay , Paris 75005, France"}]},{"given":"Nils","family":"Tern\u00e8s","sequence":"additional","affiliation":[{"name":"Biostatistics and Programming Department, Sanofi R&D , Chilly Mazarin 91380, France"}]}],"member":"286","published-online":{"date-parts":[[2021,2,22]]},"reference":[{"key":"2023051609145315800_btab114-B1","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023051609145315800_btab114-B2","first-page":"595","article-title":"Statistical challenges with high dimensionality: feature selection in knowledge discovery","volume":"3","author":"Fan","year":"2006","journal-title":"Proc. Madrid Int. Congress Math"},{"key":"2023051609145315800_btab114-B3","volume-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health)","author":"Gentleman","year":"2005"},{"key":"2023051609145315800_btab114-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/bimj.201700067","article-title":"Variable selection - a review and recommendations for the practicing statistician","volume":"60","author":"Heinze","year":"2018","journal-title":"Biometrical J"},{"key":"2023051609145315800_btab114-B5","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1214\/15-EJS1029","article-title":"Preconditioning the lasso for sign consistency","volume":"9","author":"Jia","year":"2015","journal-title":"Electron. J. Stat"},{"key":"2023051609145315800_btab114-B6","doi-asserted-by":"crossref","first-page":"S16","DOI":"10.1016\/j.metabol.2014.10.027","article-title":"Biomarkers for personalized oncology: recent advances and future challenges","volume":"64","author":"Kalia","year":"2015","journal-title":"Metabolism"},{"key":"2023051609145315800_btab114-B7","volume-title":"Handbook of Biological Statistics","author":"McDonald","year":"2009","edition":"2nd edn"},{"key":"2023051609145315800_btab114-B8","doi-asserted-by":"crossref","first-page":"265.","DOI":"10.1186\/1756-0500-5-265","article-title":"Human gene correlation analysis (HGCA): a tool for the identification of transcriptionally co-expressed genes","volume":"5","author":"Michalopoulos","year":"2012","journal-title":"BMC Res. Notes"},{"key":"2023051609145315800_btab114-B9","author":"Perrot-Dock\u00e8s","year":"2020"},{"key":"2023051609145315800_btab114-B10","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051609145315800_btab114-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-018-0143-6","article-title":"Step away from stepwise","volume":"5","author":"Smith","year":"2018","journal-title":"J. Big Data"},{"key":"2023051609145315800_btab114-B12","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1093\/jnci\/djj052","article-title":"Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis","volume":"98","author":"Sotiriou","year":"2006","journal-title":"JNCI J. Natl. Cancer Inst"},{"key":"2023051609145315800_btab114-B13","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023051609145315800_btab114-B14","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1214\/11-AOS878","article-title":"The solution path of the generalized lasso","volume":"39","author":"Tibshirani","year":"2011","journal-title":"Ann. Stat"},{"key":"2023051609145315800_btab114-B15","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1093\/bioinformatics\/bty750","article-title":"Precision lasso: accounting for correlations and linear dependencies in high-dimensional genomic data","volume":"35","author":"Wang","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051609145315800_btab114-B16","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1111\/rssb.12127","article-title":"High dimensional ordinary least squares projection for screening variables","volume":"78","author":"Wang","year":"2016","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023051609145315800_btab114-B17","doi-asserted-by":"crossref","first-page":"109647.","DOI":"10.1016\/j.biopha.2019.109647","article-title":"Estrogen receptor 1 and progesterone receptor are distinct biomarkers and prognostic factors in estrogen receptor-positive breast cancer: evidence from a bioinformatic analysis","volume":"121","author":"Wu","year":"2020","journal-title":"Biomed. Pharmacother"},{"key":"2023051609145315800_btab114-B18","first-page":"2541","article-title":"On model selection consistency of lasso","volume":"7","author":"Zhao","year":"2006","journal-title":"J. Mach. Learn. Res"},{"key":"2023051609145315800_btab114-B19","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab114\/36599563\/btab114.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/16\/2238\/50339640\/btab114.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/16\/2238\/50339640\/btab114.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,24]],"date-time":"2024-08-24T12:17:26Z","timestamp":1724501846000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/16\/2238\/6146520"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,2,22]]},"references-count":19,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2021,8,25]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab114","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,8,15]]},"published":{"date-parts":[[2021,2,22]]}}}