{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T21:24:16Z","timestamp":1774905856110,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T00:00:00Z","timestamp":1674777600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Fonds de recherche Qu\u00e9bec-Sant\u00e9","award":["267074"],"award-info":[{"award-number":["267074"]}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN-2019-06727"],"award-info":[{"award-number":["RGPIN-2019-06727"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN-2020-05133"],"award-info":[{"award-number":["RGPIN-2020-05133"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,2,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PCs) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs).<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on penalized quasi-likelihood estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS. We show through simulations that when the dimensionality of the relatedness matrix is high, penalized LMM and logistic regression with PC adjustment fail to select important predictors, and have inferior prediction accuracy compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in a subset of 6731 related individuals from the UK Biobank data with 320K SNPs that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Our Julia package PenalizedGLMM.jl is publicly available on github: https:\/\/github.com\/julstpierre\/PenalizedGLMM.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad063","type":"journal-article","created":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T07:50:33Z","timestamp":1674892233000},"source":"Crossref","is-referenced-by-count":12,"title":["Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9627-576X","authenticated-orcid":false,"given":"Julien","family":"St-Pierre","sequence":"first","affiliation":[{"name":"Department of Epidemiology, Biostatistics and Occupational Health, McGill University , Montr\u00e9al, QC H3A 1G1, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9911-079X","authenticated-orcid":false,"given":"Karim","family":"Oualkacha","sequence":"additional","affiliation":[{"name":"D\u00e9partement de Math\u00e9matiques, Universit\u00e9 du Qu\u00e9bec \u00e0 Montr\u00e9al , Montr\u00e9al, QC H2X 3Y7, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8956-2509","authenticated-orcid":false,"given":"Sahir Rai","family":"Bhatnagar","sequence":"additional","affiliation":[{"name":"Department of Epidemiology, Biostatistics and Occupational Health, McGill University , Montr\u00e9al, QC H3A 1G1, Canada"}]}],"member":"286","published-online":{"date-parts":[[2023,1,27]]},"reference":[{"key":"2023020815005747700_btad063-B1","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1137\/141000671","article-title":"Julia: a fresh approach to numerical computing","volume":"59","author":"Bezanson","year":"2017","journal-title":"SIAM Rev"},{"key":"2023020815005747700_btad063-B2","author":"Bhatnagar","year":"2020"},{"key":"2023020815005747700_btad063-B3","doi-asserted-by":"crossref","first-page":"e1008766","DOI":"10.1371\/journal.pgen.1008766","article-title":"Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models","volume":"16","author":"Bhatnagar","year":"2020","journal-title":"PLoS Genet"},{"key":"2023020815005747700_btad063-B4","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1007\/BF00049423","article-title":"Monotonicity of quadratic-approximation algorithms","volume":"40","author":"B\u00f6hning","year":"1988","journal-title":"Ann. Inst. Stat. Math"},{"key":"2023020815005747700_btad063-B5","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1080\/01621459.1993.10594284","article-title":"Approximate inference in generalized linear mixed models","volume":"88","author":"Breslow","year":"1993","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020815005747700_btad063-B6","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41586-018-0579-z","article-title":"The UK biobank resource with deep phenotyping and genomic data","volume":"562","author":"Bycroft","year":"2018","journal-title":"Nature"},{"key":"2023020815005747700_btad063-B7","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1016\/j.ajhg.2016.02.012","article-title":"Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models","volume":"98","author":"Chen","year":"2016","journal-title":"Am. J. Hum. Genet"},{"key":"2023020815005747700_btad063-B8","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020815005747700_btad063-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023020815005747700_btad063-B10","doi-asserted-by":"crossref","first-page":"1440","DOI":"10.2307\/2533274","article-title":"Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models","volume":"51","author":"Gilmour","year":"1995","journal-title":"Biometrics"},{"key":"2023020815005747700_btad063-B11","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/s11222-012-9359-z","article-title":"Variable selection for generalized linear mixed models by L 1-penalized estimation","volume":"24","author":"Groll","year":"2014","journal-title":"Stat. Comput"},{"key":"2023020815005747700_btad063-B12","doi-asserted-by":"crossref","first-page":"1323","DOI":"10.1080\/01621459.2016.1215989","article-title":"Joint selection in mixed models using regularized PQL","volume":"112","author":"Hui","year":"2017","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020815005747700_btad063-B13","doi-asserted-by":"crossref","first-page":"1749","DOI":"10.1038\/s41588-019-0530-8","article-title":"A resource-efficient tool for mixed model association analysis of large-scale data","volume":"51","author":"Jiang","year":"2019","journal-title":"Nat. Genet"},{"key":"2023020815005747700_btad063-B14","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1038\/ng.548","article-title":"Variance component model to account for sample structure in genome-wide association studies","volume":"42","author":"Kang","year":"2010","journal-title":"Nat. Genet"},{"key":"2023020815005747700_btad063-B15","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1093\/bioinformatics\/btq688","article-title":"The Bayesian Lasso for genome-wide association studies","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020815005747700_btad063-B16","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1038\/nature08494","article-title":"Finding the missing heritability of complex diseases","volume":"461","author":"Manolio","year":"2009","journal-title":"Nature"},{"key":"2023020815005747700_btad063-B17","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1016\/j.csda.2006.12.019","article-title":"Relaxed Lasso","volume":"52","author":"Meinshausen","year":"2007","journal-title":"Comput. Stat. Data Anal"},{"key":"2023020815005747700_btad063-B18","doi-asserted-by":"crossref","first-page":"e1009241","DOI":"10.1371\/journal.pgen.1009241","article-title":"Estimating FST and kinship for arbitrary population structures","volume":"17","author":"Ochoa","year":"2021","journal-title":"PLoS Genet"},{"key":"2023020815005747700_btad063-B19","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1016\/j.ajhg.2019.07.003","article-title":"Extreme polygenicity of complex traits is explained by negative selection","volume":"105","author":"O'Connor","year":"2019","journal-title":"Am. J. Hum. Genet"},{"key":"2023020815005747700_btad063-B20","doi-asserted-by":"crossref","DOI":"10.1186\/s12711-018-0373-2","article-title":"Large-scale genomic prediction using singular value decomposition of the genotype matrix","volume":"50","author":"\u00d8deg\u00e5rd","year":"2018","journal-title":"Genet. Select. Evol"},{"key":"2023020815005747700_btad063-B21","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1038\/ng1847","article-title":"Principal components analysis corrects for stratification in genome-wide association studies","volume":"38","author":"Price","year":"2006","journal-title":"Nat. Genet"},{"key":"2023020815005747700_btad063-B22","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1038\/nrg2813","article-title":"New approaches to population stratification in genome-wide association studies","volume":"11","author":"Price","year":"2010","journal-title":"Nat. Rev. Genet"},{"key":"2023020815005747700_btad063-B23","author":"Priv\u00e9","year":"2020"},{"key":"2023020815005747700_btad063-B24","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1093\/bioinformatics\/bts669","article-title":"A Lasso multi-marker mixed model for association mapping with population structure correction","volume":"29","author":"Rakitsch","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020815005747700_btad063-B25","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1002\/gepi.22384","article-title":"Penalized linear mixed models for structured genetic data","author":"Reisetter","year":"2021","journal-title":"Genet. Epidemiol"},{"key":"2023020815005747700_btad063-B26","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020815005747700_btad063-B27","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1111\/j.1467-9868.2011.01004.x","article-title":"Strong rules for discarding predictors in lasso-type problems","volume":"74","author":"Tibshirani","year":"2011","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023020815005747700_btad063-B28","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.ajhg.2017.06.005","article-title":"10 years of GWAS discovery: biology, function, and translation","volume":"101","author":"Visscher","year":"2017","journal-title":"Am. J. Hum. Genet"},{"key":"2023020815005747700_btad063-B29","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-019-2743-3","article-title":"AUTALASSO: an automatic adaptive LASSO for genome-wide prediction","volume":"20","author":"Waldmann","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023020815005747700_btad063-B30","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.ajhg.2010.11.011","article-title":"GCTA: a tool for genome-wide complex trait analysis","volume":"88","author":"Yang","year":"2011","journal-title":"Am. J. Hum. Genet"},{"key":"2023020815005747700_btad063-B31","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/ng1702","article-title":"A unified mixed-model method for association mapping that accounts for multiple levels of relatedness","volume":"38","author":"Yu","year":"2006","journal-title":"Nat. Genet"},{"key":"2023020815005747700_btad063-B32","doi-asserted-by":"crossref","first-page":"894","DOI":"10.1214\/09-AOS729","article-title":"Nearly unbiased variable selection under minimax concave penalty","volume":"38","author":"Zhang","year":"2010","journal-title":"Ann. Stat"},{"key":"2023020815005747700_btad063-B33","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1038\/ng.546","article-title":"Mixed linear model approach adapted for genome-wide association studies","volume":"42","author":"Zhang","year":"2010","journal-title":"Nat. Genet"},{"key":"2023020815005747700_btad063-B34","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1038\/s41588-018-0184-y","article-title":"Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies","volume":"50","author":"Zhou","year":"2018","journal-title":"Nat. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad063\/48942682\/btad063.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/2\/btad063\/49124083\/btad063.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/2\/btad063\/49124083\/btad063.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,13]],"date-time":"2024-10-13T02:02:41Z","timestamp":1728784961000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad063\/7008326"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,1,27]]},"references-count":34,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,2,3]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad063","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,2,1]]},"published":{"date-parts":[[2023,1,27]]},"article-number":"btad063"}}