{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:32Z","timestamp":1772138012226,"version":"3.50.1"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T00:00:00Z","timestamp":1626134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Research Foundation of Korea and Ministry of Science and ICT of Republic of Korea","award":["2019R1A2C1007126"],"award-info":[{"award-number":["2019R1A2C1007126"]}]},{"name":"National Research Foundation of Korea and Ministry of Science and ICT of Republic of Korea","award":["2020R1A6A3A03037675"],"award-info":[{"award-number":["2020R1A6A3A03037675"]}]},{"DOI":"10.13039\/501100001459","name":"Singapore Ministry of Education","doi-asserted-by":"crossref","award":["MOE2018-T2-2-058"],"award-info":[{"award-number":["MOE2018-T2-2-058"]}],"id":[{"id":"10.13039\/501100001459","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001349","name":"National Medical Research Council of Singapore","doi-asserted-by":"crossref","award":["NMRC-CG-M009"],"award-info":[{"award-number":["NMRC-CG-M009"]}],"id":[{"id":"10.13039\/501100001349","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype\u2013phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.<\/jats:p>","DOI":"10.1093\/bib\/bbab256","type":"journal-article","created":{"date-parts":[[2021,6,17]],"date-time":"2021-06-17T15:10:09Z","timestamp":1623942609000},"source":"Crossref","is-referenced-by-count":4,"title":["Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9897-1993","authenticated-orcid":false,"given":"Seyoon","family":"Ko","sequence":"first","affiliation":[{"name":"Department of Statistics, Seoul National University, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9260-0414","authenticated-orcid":false,"given":"Ginny X","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Medicine, National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6687-3088","authenticated-orcid":false,"given":"Hyungwon","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Medicine, National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5597-9557","authenticated-orcid":false,"given":"Joong-Ho","family":"Won","sequence":"additional","affiliation":[{"name":"Department of Statistics, Seoul National University, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,7,13]]},"reference":[{"key":"2021110815082607400_ref1","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1038\/s42003-018-0261-x","article-title":"A scientometric review of genome-wide association studies","volume":"2","author":"Mills","year":"2019","journal-title":"Commun Biol"},{"key":"2021110815082607400_ref2","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1016\/j.cell.2018.02.052","article-title":"An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics","volume":"173","author":"Liu","year":"2018","journal-title":"Cell"},{"key":"2021110815082607400_ref3","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1214\/07-AOAS131","article-title":"Pathwise coordinate optimization","volume":"1","author":"Friedman","year":"2007","journal-title":"Ann Appl Stat"},{"key":"2021110815082607400_ref4","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1080\/10618600.1998.10474784","article-title":"Penalized regressions: the bridge versus the lasso","volume":"7","author":"Fu","year":"1998","journal-title":"J Comput Graph Stat"},{"key":"2021110815082607400_ref5","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1214\/07-AOAS147","article-title":"Coordinate descent algorithms for lassopenalized regression","volume":"2","author":"Wu","year":"2008","journal-title":"Ann Appl Stat"},{"key":"2021110815082607400_ref6","article-title":"Safe feature elimination for the LASSO and sparse supervised learning problems","author":"El Ghaoui","year":"2011","journal-title":"arXiv preprint"},{"key":"2021110815082607400_ref7","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1111\/j.1467-9868.2011.01004.x","article-title":"Strong rules for discarding predictors in lasso-type problems","volume":"74","author":"Tibshirani","year":"2012","journal-title":"J R Stat Soc Series B Stat Methodology"},{"key":"2021110815082607400_ref8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J Stat Softw"},{"key":"2021110815082607400_ref9","doi-asserted-by":"crossref","DOI":"10.1093\/biostatistics\/kxaa038","article-title":"Fast lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK biobank","author":"Li","year":"2020","journal-title":"Biostatistics"},{"key":"2021110815082607400_ref10","doi-asserted-by":"crossref","first-page":"e1009141","DOI":"10.1371\/journal.pgen.1009141","article-title":"A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK biobank","volume":"16","author":"Qian","year":"2020","journal-title":"PLoS Genet"},{"key":"2021110815082607400_ref11","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giaa044","article-title":"Iterative hard thresholding in genome-wide association studies: generalized linear models, prior weights, and double sparsity","volume":"9","author":"Chu","year":"2020","journal-title":"Gigascience"},{"key":"2021110815082607400_ref12","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1007\/s11222-013-9424-2","article-title":"Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors","volume":"25","author":"Breheny","year":"2015","journal-title":"Stat Comput"},{"key":"2021110815082607400_ref13","doi-asserted-by":"crossref","first-page":"179","DOI":"10.4137\/CIN.S40043","article-title":"Overlapping group logistic regression with applications to genetic pathway selection","volume":"15","author":"Zeng","year":"2016","journal-title":"Cancer Informat"},{"key":"2021110815082607400_ref14","article-title":"High-performance statistical computing in the computing environments of the 2020s","author":"Ko","journal-title":"Stat Sci"},{"key":"2021110815082607400_ref15","first-page":"1","article-title":"Distributed optimization and statistical learning via the alternating direction method of multipliers, found","volume":"3","author":"Boyd","year":"2011","journal-title":"Trends Mach Learn"},{"key":"2021110815082607400_ref16","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1016\/j.cels.2018.03.002","article-title":"Scalable Open Science approach for mutation calling of tumor exomes using multiple genomic pipelines","volume":"6","author":"Ellrott","year":"2018","journal-title":"Cell Syst"},{"key":"2021110815082607400_ref17","doi-asserted-by":"crossref","first-page":"1873","DOI":"10.1001\/jama.2011.593","article-title":"A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer","volume":"305","author":"Hatzis","year":"2011","journal-title":"JAMA"},{"key":"2021110815082607400_ref18","doi-asserted-by":"crossref","first-page":"1587","DOI":"10.1158\/1078-0432.CCR-12-1359","article-title":"Biomarker analysis of neoadjuvant doxorubicin\/cyclophosphamide followed by ixabepilone or paclitaxel in early-stage breast cancer","volume":"19","author":"Horak","year":"2013","journal-title":"Clin Cancer Res"},{"key":"2021110815082607400_ref19","doi-asserted-by":"crossref","first-page":"913","DOI":"10.1111\/j.1349-7006.2012.02231.x","article-title":"GSTP1 expression predicts poor pathological complete response to neoadjuvant chemotherapy in ER-negative breast cancer","volume":"103","author":"Miyake","year":"2012","journal-title":"Cancer Sci"},{"key":"2021110815082607400_ref20","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1186\/s12916-015-0540-z","article-title":"Response and survival of breast cancer intrinsic subtypes following multi-agent neoadjuvant chemotherapy","volume":"13","author":"Prat","year":"2015","journal-title":"BMC Med"},{"key":"2021110815082607400_ref21","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21606-5","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"Hastie","year":"2001"},{"key":"2021110815082607400_ref22","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables","volume":"34","author":"Cox","year":"1972","journal-title":"J R Stat Soc Ser B"},{"key":"2021110815082607400_ref23","first-page":"185","volume-title":"Fixed-point algorithms for inverse problems in science and engineering","author":"Combettes"},{"key":"2021110815082607400_ref24","first-page":"433","volume-title":"Proceedings of the 26th International Conference of Machine Learning","author":"Jacob","year":"2009"},{"key":"2021110815082607400_ref25","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J R Stat Soc Ser B"},{"key":"2021110815082607400_ref26","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1214\/11-AOAS514","article-title":"Smoothing proximal gradient method for general structured sparse regression","volume":"6","author":"Chen","year":"2012","journal-title":"Ann Appl Stat"},{"key":"2021110815082607400_ref27","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1080\/10618600.2019.1592757","article-title":"Easily parallelizable and distributable class of algorithms for structured sparsity, with optimal acceleration","volume":"28","author":"Ko","year":"2019","journal-title":"J Comput Graph Stat"},{"key":"2021110815082607400_ref28","doi-asserted-by":"crossref","first-page":"1483","DOI":"10.1126\/science.aab4082","article-title":"Somatic mutation in cancer and normal cells","volume":"349","author":"Martincorena","year":"2015","journal-title":"Science"},{"key":"2021110815082607400_ref29","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1186\/s13073-017-0465-6","article-title":"Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes","volume":"9","author":"Przytycki","year":"2017","journal-title":"Genome Med"},{"key":"2021110815082607400_ref30","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1002\/humu.23979","article-title":"A protein-centric approach for exome variant aggregation enables sensitive association analysis with clinical outcomes","volume":"41","author":"Li","year":"2020","journal-title":"Hum Mutat"},{"key":"2021110815082607400_ref31","doi-asserted-by":"crossref","first-page":"D512","DOI":"10.1093\/nar\/gku1267","article-title":"PhosphoSitePlus, 2014: mutations, PTMs and recalibrations","volume":"43","author":"Hornbeck","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2021110815082607400_ref32","doi-asserted-by":"crossref","first-page":"D427","DOI":"10.1093\/nar\/gky995","article-title":"The Pfam protein families database in 2019","volume":"47","author":"El-Gebali","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2021110815082607400_ref33","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"key":"2021110815082607400_ref34","doi-asserted-by":"crossref","first-page":"D712","DOI":"10.1093\/nar\/gkq1156","article-title":"ConsensusPathDB: toward a more complete picture of cell biology","volume":"39","author":"Kamburov","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2021110815082607400_ref35","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1214\/aos\/1074290335","article-title":"The positive false discovery rate: a Bayesian interpretation and the q-value","volume":"31","author":"Storey","year":"2003","journal-title":"Ann Stat"},{"key":"2021110815082607400_ref36","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1186\/1471-2105-9-405","article-title":"iRefIndex: a consolidated protein interaction database with provenance","volume":"9","author":"Razick","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2021110815082607400_ref37","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1038\/nature22366","article-title":"Architecture of the human interactome defines protein communities and disease networks","volume":"545","author":"Huttlin","year":"2017","journal-title":"Nature"},{"key":"2021110815082607400_ref38","doi-asserted-by":"crossref","first-page":"1327","DOI":"10.1016\/j.cell.2017.05.046","article-title":"Electronic address wbe, cancer genome atlas research N. comprehensive and integrative genomic characterization of hepatocellular carcinoma","volume":"169","author":"Cancer Genome Atlas Research Network","year":"2017","journal-title":"Cell"},{"key":"2021110815082607400_ref39","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3","article-title":"The LASSO method for variable selection in the Cox model","volume":"16","author":"Tibshirani","year":"1997","journal-title":"Stat Med"},{"key":"2021110815082607400_ref40","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2021110815082607400_ref41","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1145\/130385.130401","volume-title":"Fifth Annual Workshop on Computational Learning Theory","author":"Boser","year":"1992"},{"key":"2021110815082607400_ref42","article-title":"DistStat.Jl: towards unified programming for high-performance statistical computing environments in Julia","author":"Ko","year":"2020","journal-title":"arXiv preprint"},{"key":"2021110815082607400_ref43","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1214\/20-STS784","article-title":"Stochastic approximation: from statistical origin to big-data, multidisciplinary applications","volume":"36","author":"Lai","year":"2021","journal-title":"Stat Sci"},{"key":"2021110815082607400_ref44","first-page":"1574","volume-title":"Proceedings of the 27th International Conference on Neural Information Processing System","author":"Nitanda","year":"2014"},{"key":"2021110815082607400_ref45","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1007\/s00245-019-09617-7","article-title":"Convergence of stochastic proximal gradient algorithm","volume":"82","author":"Rosasco","year":"2020","journal-title":"Appl Math Optim"},{"key":"2021110815082607400_ref46","doi-asserted-by":"crossref","first-page":"2057","DOI":"10.1137\/140961791","article-title":"A proximal stochastic gradient method with progressive variance reduction","volume":"24","author":"Xiao","year":"2014","journal-title":"SIAM J Optim"},{"key":"2021110815082607400_ref47","doi-asserted-by":"crossref","first-page":"894","DOI":"10.1214\/09-AOS729","article-title":"Nearly unbiased variable selection under minimax concave penalty","volume":"38","author":"Zhang","year":"2010","journal-title":"Ann Stat"},{"key":"2021110815082607400_ref48","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its Oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J Am Stat Assoc"},{"key":"2021110815082607400_ref49","first-page":"2206","volume-title":"Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence","author":"Zhong","year":"2014"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab256\/41089437\/bbab256.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab256\/41089437\/bbab256.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,1]],"date-time":"2024-09-01T21:54:06Z","timestamp":1725227646000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab256\/6319937"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,13]]},"references-count":49,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab256","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.01.10.426142","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,7,13]]},"article-number":"bbab256"}}