{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T02:01:03Z","timestamp":1775181663249,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2018,9,1]],"date-time":"2018-09-01T00:00:00Z","timestamp":1535760000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000005","name":"Department of Defense","doi-asserted-by":"publisher","award":["FA8721-05-C-0003"],"award-info":[{"award-number":["FA8721-05-C-0003"]}],"id":[{"id":"10.13039\/100000005","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01-GM093156"],"award-info":[{"award-number":["R01-GM093156"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["P30-DA035778"],"award-info":[{"award-number":["P30-DA035778"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Software is available at https:\/\/github.com\/HaohanWang\/thePrecisionLasso.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty750","type":"journal-article","created":{"date-parts":[[2018,9,1]],"date-time":"2018-09-01T04:01:06Z","timestamp":1535774466000},"page":"1181-1187","source":"Crossref","is-referenced-by-count":157,"title":["Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data"],"prefix":"10.1093","volume":"35","author":[{"given":"Haohan","family":"Wang","sequence":"first","affiliation":[{"name":"Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Benjamin J","family":"Lengerich","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Bryon","family":"Aragam","sequence":"additional","affiliation":[{"name":"Department of Machine Learning, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Eric P","family":"Xing","sequence":"additional","affiliation":[{"name":"Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"},{"name":"Department of Computer Science, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"},{"name":"Department of Machine Learning, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,9,1]]},"reference":[{"key":"2023013107272543900_bty750-B1","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1007\/s004400050210","article-title":"Risk bounds for model selection via penalization","volume":"113","author":"Barron","year":"1999","journal-title":"Probability Theory Relat. Fields"},{"key":"2023013107272543900_bty750-B2","first-page":"401","article-title":"The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data","volume-title":"Cancer Discov.","author":"Cerami","year":"2012"},{"key":"2023013107272543900_bty750-B3","volume-title":"Mathematical Methods of Statistics","author":"Cramer","year":"1946"},{"key":"2023013107272543900_bty750-B4","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1007\/BF02678430","article-title":"Adaptive greedy approximations","volume":"13","author":"Davis","year":"1997","journal-title":"Constructive Approx"},{"key":"2023013107272543900_bty750-B5","volume-title":"Elements of Continuous Multivariate Analysis","author":"Dempster","year":"1969"},{"key":"2023013107272543900_bty750-B6","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023013107272543900_bty750-B7","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1111\/j.1467-9868.2008.00674.x","article-title":"Sure independence screening for ultrahigh dimensional feature space","volume":"70","author":"Fan","year":"2008","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023013107272543900_bty750-B8","doi-asserted-by":"crossref","first-page":"D805","DOI":"10.1093\/nar\/gku1075","article-title":"Cosmic: exploring the world\u2019s knowledge of somatic mutations in human cancer","volume":"43","author":"Forbes","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023013107272543900_bty750-B9","first-page":"0736","article-title":"A note on the group lasso and a sparse group lasso","volume":"1001","author":"Friedman","year":"2010","journal-title":"arXiv Preprint arXiv"},{"key":"2023013107272543900_bty750-B10","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1137\/S0895479897326432","article-title":"Tikhonov regularization and total least squares","volume":"21","author":"Golub","year":"1999","journal-title":"SIAM J. Matrix Anal. Appl"},{"key":"2023013107272543900_bty750-B11","doi-asserted-by":"crossref","first-page":"1081","DOI":"10.1038\/nmeth.2642","article-title":"Intogen-mutations identifies cancer drivers across tumor types","volume":"10","author":"Gonzalez-Perez","year":"2013","journal-title":"Nat. Methods"},{"key":"2023013107272543900_bty750-B12","first-page":"2187","article-title":"Trace lasso: a trace norm regularization for correlated designs","author":"Grave","year":"2011","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023013107272543900_bty750-B13","doi-asserted-by":"crossref","first-page":"1780","DOI":"10.1214\/11-AOAS455","article-title":"Bayesian variable selection regression for genome-wide association studies and other large-scale problems","volume":"5","author":"Guan","year":"2011","journal-title":"Ann. Appl. Stat"},{"key":"2023013107272543900_bty750-B14","doi-asserted-by":"crossref","first-page":"e0138903","DOI":"10.1371\/journal.pone.0138903","article-title":"Variable-selection emerges on top in empirical comparison of whole-genome complex-trait prediction methods","volume":"10","author":"Haws","year":"2015","journal-title":"PLoS One"},{"key":"2023013107272543900_bty750-B15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/bioinformatics\/btq600","article-title":"A variable selection method for genome-wide association studies","volume":"27","author":"He","year":"2011","journal-title":"Bioinformatics"},{"key":"2023013107272543900_bty750-B16","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","article-title":"Ridge regression: biased estimation for nonorthogonal problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"2023013107272543900_bty750-B17","first-page":"1603","article-title":"Adaptive lasso for sparse high-dimensional regression models","volume":"18","author":"Huang","year":"2008","journal-title":"Stat. Sinica"},{"key":"2023013107272543900_bty750-B18","doi-asserted-by":"crossref","first-page":"e1000587","DOI":"10.1371\/journal.pgen.1000587","article-title":"Statistical estimation of correlated genome associations to a quantitative trait network","volume":"5","author":"Kim","year":"2009","journal-title":"PLoS Genet"},{"key":"2023013107272543900_bty750-B19","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1186\/1471-2105-10-384","article-title":"Regularized estimation of large-scale gene association networks using graphical gaussian models","volume":"10","author":"Kr\u00e4mer","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023013107272543900_bty750-B20","article-title":"Computational complexity, np completeness and optimization duality: a survey","volume":"19","author":"Manyem","year":"2012","journal-title":"In: Electronic Colloquium on Computational Complexity (ECCC), Vol"},{"key":"2023013107272543900_bty750-B21","doi-asserted-by":"crossref","first-page":"1436","DOI":"10.1214\/009053606000000281","article-title":"High-dimensional graphs and variable selection with the lasso","volume":"34","author":"Meinshausen","year":"2006","journal-title":"Ann. Stat"},{"key":"2023013107272543900_bty750-B22","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1186\/1756-0500-5-265","article-title":"Human gene correlation analysis (hgca): a tool for the identification of transcriptionally co-expressed genes","volume":"5","author":"Michalopoulos","year":"2012","journal-title":"BMC Res. Notes"},{"key":"2023013107272543900_bty750-B23","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/1753-6561-6-S2-S10","article-title":"Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions","volume":"6","author":"Ogutu","year":"2012","journal-title":"BMC Proc"},{"key":"2023013107272543900_bty750-B24","doi-asserted-by":"crossref","first-page":"e49445.","DOI":"10.1371\/journal.pone.0049445","article-title":"Finite adaptation and multistep moves in the metropolis-hastings algorithm for variable selection in genome-wide association analysis","volume":"7","author":"Peltola","year":"2012","journal-title":"PLoS One"},{"key":"2023013107272543900_bty750-B25","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1093\/bioinformatics\/14.9.817","article-title":"Modeltest: testing the model of dna substitution","volume":"14","author":"Posada","year":"1998","journal-title":"Bioinformatics"},{"key":"2023013107272543900_bty750-B26","doi-asserted-by":"crossref","first-page":"50252","DOI":"10.18632\/oncotarget.17225","article-title":"Characterization of potential driver mutations involved in human breast cancer by computational approaches","volume":"8","author":"Rajendran","year":"2017","journal-title":"Oncotarget"},{"key":"2023013107272543900_bty750-B27","doi-asserted-by":"crossref","first-page":"1287","DOI":"10.1214\/09-AOS691","article-title":"High-dimensional ising model selection using 1-regularized logistic regression","volume":"38","author":"Ravikumar","year":"2010","journal-title":"Ann. Stat"},{"key":"2023013107272543900_bty750-B28","first-page":"545","volume-title":"International Conference on Computational Learning Theory","author":"Srebro","year":"2005"},{"key":"2023013107272543900_bty750-B29","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodological"},{"key":"2023013107272543900_bty750-B30","doi-asserted-by":"crossref","DOI":"10.1109\/BIBM.2016.7822753","article-title":"Multiple confounders correction with regularized linear mixed effect models, with application in biological processes","author":"Wang","year":"2016","journal-title":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)"},{"key":"2023013107272543900_bty750-B31","author":"Wang","year":"2017"},{"key":"2023013107272543900_bty750-B32","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1093\/bioinformatics\/btp041","article-title":"Genome-wide association analysis by lasso penalized logistic regression","volume":"25","author":"Wu","year":"2009","journal-title":"Bioinformatics"},{"key":"2023013107272543900_bty750-B33","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1109\/TPAMI.2011.177","article-title":"Sparse algorithms are not stable: a no-free-lunch theorem","volume":"34","author":"Xu","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Machine Intel"},{"key":"2023013107272543900_bty750-B34","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1111\/j.1467-9868.2007.00581.x","article-title":"On the non-negative garrotte estimator","volume":"69","author":"Yuan","year":"2007","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023013107272543900_bty750-B35","doi-asserted-by":"crossref","first-page":"894","DOI":"10.1214\/09-AOS729","article-title":"Nearly unbiased variable selection under minimax concave penalty","volume":"38","author":"Zhang","year":"2010","journal-title":"Ann. Stat"},{"key":"2023013107272543900_bty750-B36","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1214\/12-STS399","article-title":"A general theory of concave regularization for high-dimensional sparse estimation problems","volume":"27","author":"Zhang","year":"2012","journal-title":"Stat. Sci"},{"key":"2023013107272543900_bty750-B37","first-page":"2541","article-title":"On model selection consistency of lasso","volume":"7","author":"Zhao","year":"2006","journal-title":"J. Machine Learn. Res"},{"key":"2023013107272543900_bty750-B38","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1198\/016214506000000735","article-title":"The adaptive lasso and its oracle properties","volume":"101","author":"Zou","year":"2006","journal-title":"J. Am. Stat. Assoc"},{"key":"2023013107272543900_bty750-B39","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1181\/48967734\/bioinformatics_35_7_1181.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1181\/48967734\/bioinformatics_35_7_1181.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,9]],"date-time":"2024-07-09T20:53:59Z","timestamp":1720558439000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/7\/1181\/5089232"}},"subtitle":[],"editor":[{"given":"Oliver","family":"Stegle","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,9,1]]},"references-count":39,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2019,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty750","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,4,1]]},"published":{"date-parts":[[2018,9,1]]}}}