{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T02:27:54Z","timestamp":1774664874695,"version":"3.50.1"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2021,6,19]],"date-time":"2021-06-19T00:00:00Z","timestamp":1624060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Funai Foundation for Information Technology"},{"name":"National Institute of Health"},{"name":"Multi and Trans-ethnic Mapping of Mendelian and Complex Diseases","award":["5U01 HG009080"],"award-info":[{"award-number":["5U01 HG009080"]}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG010140"],"award-info":[{"award-number":["R01HG010140"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["5R01 EB001988-16"],"award-info":[{"award-number":["5R01 EB001988-16"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["19 DMS1208164"],"award-info":[{"award-number":["19 DMS1208164"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["DMS-1407548"],"award-info":[{"award-number":["DMS-1407548"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["5R01 EB 001988-21"],"award-info":[{"award-number":["5R01 EB 001988-21"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"UK Biobank Resource","award":["24983"],"award-info":[{"award-number":["24983"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1\u200a000\u200a000 variants and almost 100\u200a000 individuals within 10\u2009min and using less than 32GB of memory.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>https:\/\/github.com\/rivas-lab\/snpnet\/tree\/compact.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab452","type":"journal-article","created":{"date-parts":[[2021,6,16]],"date-time":"2021-06-16T07:12:37Z","timestamp":1623827557000},"page":"4148-4155","source":"Crossref","is-referenced-by-count":20,"title":["Fast numerical optimization for genome sequencing data in population biobanks"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5152-7086","authenticated-orcid":false,"given":"Ruilin","family":"Li","sequence":"first","affiliation":[{"name":"Institute for Computational and Mathematical Engineering, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Christopher","family":"Chang","sequence":"additional","affiliation":[{"name":"Grail, Inc , Menlo Park, CA 94025, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9759-157X","authenticated-orcid":false,"given":"Yosuke","family":"Tanigawa","sequence":"additional","affiliation":[{"name":"Department of Biomedical Data Science, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Balasubramanian","family":"Narasimhan","sequence":"additional","affiliation":[{"name":"Department of Biomedical Data Science, Stanford University , Stanford, CA 94305, USA"},{"name":"Department of Statistics, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Trevor","family":"Hastie","sequence":"additional","affiliation":[{"name":"Department of Biomedical Data Science, Stanford University , Stanford, CA 94305, USA"},{"name":"Department of Statistics, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Robert","family":"Tibshirani","sequence":"additional","affiliation":[{"name":"Department of Biomedical Data Science, Stanford University , Stanford, CA 94305, USA"},{"name":"Department of Statistics, Stanford University , Stanford, CA 94305, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1457-9925","authenticated-orcid":false,"given":"Manuel A","family":"Rivas","sequence":"additional","affiliation":[{"name":"Department of Statistics, Stanford University , Stanford, CA 94305, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,6,19]]},"reference":[{"key":"2023051701223885200_btab452-B1","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1016\/j.ajhg.2019.07.001","article-title":"Phenome-wide burden of copy-number variation in the UK biobank","volume":"105","author":"Aguirre","year":"2019","journal-title":"Am. J. Hum. Genet"},{"key":"2023051701223885200_btab452-B2","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1137\/080716542","article-title":"A fast iterative shrinkage-thresholding algorithm for linear inverse problems","volume":"2","author":"Beck","year":"2009","journal-title":"SIAM J. Img. Sci"},{"key":"2023051701223885200_btab452-B3","first-page":"233","author":"Bulu\u00e7","year":"2009"},{"key":"2023051701223885200_btab452-B4","doi-asserted-by":"crossref","DOI":"10.1186\/s13742-015-0047-8","article-title":"Second-generation plink: rising to the challenge of larger and richer datasets","volume":"4","author":"Chang","year":"2015","journal-title":"GigaScience"},{"key":"2023051701223885200_btab452-B5","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc. Series B"},{"key":"2023051701223885200_btab452-B6","doi-asserted-by":"crossref","first-page":"1413","DOI":"10.1002\/cpa.20042","article-title":"An iterative thresholding algorithm for linear inverse problems with a sparsity constraint","volume":"57","author":"Daubechies","year":"2004","journal-title":"Commun. Pure Appl. Math"},{"key":"2023051701223885200_btab452-B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-03910-9","article-title":"Medical relevance of protein-truncating variants across 337,205 individuals in the UK biobank study","volume":"9","author":"DeBoever","year":"2018","journal-title":"Nat. Commun"},{"key":"2023051701223885200_btab452-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Software"},{"key":"2023051701223885200_btab452-B9","doi-asserted-by":"crossref","first-page":"1776","DOI":"10.1038\/s41467-019-09718-5","article-title":"Polygenic prediction via Bayesian regression and continuous shrinkage priors","volume":"10","author":"Ge","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051701223885200_btab452-B10","first-page":"297","article-title":"Generalized additive models","volume":"1","author":"Hastie","year":"1986","journal-title":"Stat. Sci"},{"key":"2023051701223885200_btab452-B11","article-title":"Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank","author":"Li","year":"2020","journal-title":"Biostatistics."},{"key":"2023051701223885200_btab452-B12","doi-asserted-by":"crossref","first-page":"5086","DOI":"10.1038\/s41467-019-12653-0","article-title":"Improved polygenic prediction by Bayesian multiple regression on summary statistics","volume":"10","author":"Lloyd-Jones","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051701223885200_btab452-B13","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1038\/ng.3190","article-title":"Efficient Bayesian mixed-model analysis increases association power in large cohorts","volume":"47","author":"Loh","year":"2015","journal-title":"Nat. Genet"},{"key":"2023051701223885200_btab452-B14","volume-title":"A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing","author":"Morton","year":"1966"},{"key":"2023051701223885200_btab452-B15","first-page":"543","article-title":"A method for solving the convex programming problem with convergence rate","volume":"269","author":"Nesterov","year":"1983","journal-title":"Proc. USSR Acad. Sci"},{"key":"2023051701223885200_btab452-B16","first-page":"2781","article-title":"Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr","volume":"34","author":"Priv\u00e9","year":"2018","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023051701223885200_btab452-B17","doi-asserted-by":"crossref","first-page":"5424","DOI":"10.1093\/bioinformatics\/btaa1029","article-title":"LDpred2: better, faster, stronger","volume":"36","author":"Priv\u00e9","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051701223885200_btab452-B18","doi-asserted-by":"crossref","first-page":"e1009141","DOI":"10.1371\/journal.pgen.1009141","article-title":"A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK biobank","volume":"16","author":"Qian","year":"2020","journal-title":"PLoS Genet"},{"key":"2023051701223885200_btab452-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v039.i05","article-title":"Regularization paths for cox\u2019s proportional hazards model via coordinate descent","volume":"39","author":"Simon","year":"2011","journal-title":"J. Stat. Software"},{"key":"2023051701223885200_btab452-B20","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1038\/s41588-020-00757-z","article-title":"Genetics of 38 blood and urine biomarkers in the UK biobank","volume":"53","author":"Sinnott-Armstrong","year":"2021","journal-title":"Nat. Genet"},{"key":"2023051701223885200_btab452-B21","doi-asserted-by":"crossref","first-page":"e1001779","DOI":"10.1371\/journal.pmed.1001779","article-title":"UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age","volume":"12","author":"Sudlow","year":"2015","journal-title":"PLoS Medicine"},{"key":"2023051701223885200_btab452-B22","author":"Szustakowski","year":"2020"},{"key":"2023051701223885200_btab452-B23","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the Lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Series B (Methodological)"},{"key":"2023051701223885200_btab452-B24","author":"Venkataraman","year":"2020"},{"key":"2023051701223885200_btab452-B25","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J. R. Stat. Soc. Series B"},{"key":"2023051701223885200_btab452-B26","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Series B (Statistical Methodology)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab452\/39303155\/btab452.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/22\/4148\/50335060\/btab452.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/22\/4148\/50335060\/btab452.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,1]],"date-time":"2024-09-01T18:31:41Z","timestamp":1725215501000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/22\/4148\/6306404"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,6,19]]},"references-count":26,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2021,11,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab452","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.02.14.431030","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11,15]]},"published":{"date-parts":[[2021,6,19]]}}}