{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:33:50Z","timestamp":1767962030484,"version":"3.49.0"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: DNA methylation is a molecular modification of DNA that plays crucial roles in regulation of gene expression. Particularly, CpG rich regions are frequently hypermethylated in cancer tissues, but not methylated in normal tissues. However, there are not many methodological literatures of case-control association studies for high-dimensional DNA methylation data, compared with those of microarray gene expression. One key feature of DNA methylation data is a grouped structure among CpG sites from a gene that are possibly highly correlated. In this article, we proposed a penalized logistic regression model for correlated DNA methylation CpG sites within genes from high-dimensional array data. Our regularization procedure is based on a combination of the l1 penalty and squared l2 penalty on degree-scaled differences of coefficients of CpG sites within one gene, so it induces both sparsity and smoothness with respect to the correlated regression coefficients. We combined the penalized procedure with a stability selection procedure such that a selection probability of each regression coefficient was provided which helps us make a stable and confident selection of methylation CpG sites that are possibly truly associated with the outcome.<\/jats:p><jats:p>Results: Using simulation studies we demonstrated that the proposed procedure outperforms existing main-stream regularization methods such as lasso and elastic-net when data is correlated within a group. We also applied our method to identify important CpG sites and corresponding genes for ovarian cancer from over 20 000 CpGs generated from Illumina Infinium HumanMethylation27K Beadchip. Some genes identified are potentially associated with cancers.<\/jats:p><jats:p>Contact: \u00a0sw2206@columbia.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts145","type":"journal-article","created":{"date-parts":[[2012,3,31]],"date-time":"2012-03-31T00:24:47Z","timestamp":1333153487000},"page":"1368-1375","source":"Crossref","is-referenced-by-count":81,"title":["Penalized logistic regression for high-dimensional DNA methylation data with case-control studies"],"prefix":"10.1093","volume":"28","author":[{"given":"Hokeun","family":"Sun","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA"}]},{"given":"Shuang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,3,30]]},"reference":[{"key":"2023012512302541800_B1","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1002\/gepi.20623","article-title":"Stability selection for genome-wide association","volume":"35","author":"Alexander","year":"2011","journal-title":"Genet. Epidemiol."},{"key":"2023012512302541800_B2","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1101\/gr.4410706","article-title":"High-throughput DNA methylation profiling using universal bead arrays","volume":"16","author":"Bibikova","year":"2006","journal-title":"Genome Res."},{"key":"2023012512302541800_B3","doi-asserted-by":"crossref","first-page":"369","DOI":"10.4310\/SII.2009.v2.n3.a10","article-title":"Penalized methods for bi-level variable selction","volume":"2","author":"Breheny","year":"2009","journal-title":"Stat. Interface"},{"key":"2023012512302541800_B4","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1214\/07-AOAS131","article-title":"Pathwise coordinate optimization","volume":"1","author":"Friedman","year":"2007","journal-title":"Ann. Appl. Stat."},{"key":"2023012512302541800_B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Softw."},{"key":"2023012512302541800_B6","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1186\/1471-2105-9-365","article-title":"Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions","volume":"9","author":"Houseman","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012512302541800_B7","doi-asserted-by":"crossref","first-page":"2849","DOI":"10.1093\/bioinformatics\/btq553","article-title":"A statistical framework for illumina DNA methylation arrays","volume":"26","author":"Kuan","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512302541800_B8","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1093\/bioinformatics\/btn081","article-title":"Network-constrained regularization and variable selection for analysis of genomic data","volume":"24","author":"Li","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512302541800_B9","doi-asserted-by":"crossref","first-page":"1498","DOI":"10.1214\/10-AOAS332","article-title":"Variable selection and regression analysis for covariates with a graphical structure with an application to genomics","volume":"4","author":"Li","year":"2010","journal-title":"Ann. Appl. Stat."},{"key":"2023012512302541800_B10","first-page":"5001","article-title":"Myeloperoxidase genetic polymorphism and lung cancer risk","volume":"57","author":"London","year":"1997","journal-title":"Cancer Res."},{"key":"2023012512302541800_B11","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1093\/carcin\/bgp006","article-title":"Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma","volume":"30","author":"Marsit","year":"2009","journal-title":"Carcinogenesis"},{"key":"2023012512302541800_B12","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1111\/j.1467-9868.2007.00627.x","article-title":"The group lasso for logistic regression","volume":"70","author":"Meier","year":"2008","journal-title":"J. Roy. Stat. Soc. B"},{"key":"2023012512302541800_B13","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1111\/j.1467-9868.2010.00740.x","article-title":"Stability selection","volume":"72","author":"Meinshausen","year":"2010","journal-title":"J. Roy. Stat. Soc. B"},{"key":"2023012512302541800_B14","article-title":"Genome-wide DNA methylation profiles in hepatocellular carcinoma","volume-title":"Hepatology","author":"Shen","year":"2011"},{"key":"2023012512302541800_B15","doi-asserted-by":"crossref","first-page":"1896","DOI":"10.1093\/bioinformatics\/bth176","article-title":"A comparision of cluster analysis methods using DNA methylation data","volume":"20","author":"Siegmund","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512302541800_B16","doi-asserted-by":"crossref","first-page":"332","DOI":"10.1101\/gr.103606.109","article-title":"Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer","volume":"20","author":"Teschendorff","year":"2010","journal-title":"Genome Res."},{"key":"2023012512302541800_B17","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. Roy. Stat. Soc. B"},{"key":"2023012512302541800_B18","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1111\/j.1467-9868.2005.00490.x","article-title":"Sparsity and smoothness via the fused lasso","volume":"67","author":"Tibshirani","year":"2005","journal-title":"J. Roy. Stat. Soc. B"},{"key":"2023012512302541800_B19","doi-asserted-by":"crossref","first-page":"686","DOI":"10.1002\/gepi.20619","article-title":"Method to detect diffentially methylated loci with case-contol designs using illumina arrays","volume":"35","author":"Wang","year":"2011","journal-title":"Genet. Epidemiol."},{"key":"2023012512302541800_B20","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1093\/bioinformatics\/btp041","article-title":"Genome-wide association analysis by lasso penalized logistic regression","volume":"25","author":"Wu","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512302541800_B21","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1093\/bioinformatics\/btp167","article-title":"Kegggraph: a graph approach to KEGG pathway in r and bioconductor","volume":"25","author":"Zhang","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012512302541800_B22","doi-asserted-by":"crossref","first-page":"2375","DOI":"10.1093\/bioinformatics\/btq448","article-title":"Association screening of common and rare genetic variants by penalized regression","volume":"26","author":"Zhou","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512302541800_B23","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. Roy. Stat. Soc. B"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/10\/1368\/48864252\/bioinformatics_28_10_1368.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/10\/1368\/48864252\/bioinformatics_28_10_1368.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,22]],"date-time":"2024-04-22T12:36:20Z","timestamp":1713789380000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/10\/1368\/212244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3,30]]},"references-count":23,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2012,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts145","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,5,15]]},"published":{"date-parts":[[2012,3,30]]}}}