{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:17:12Z","timestamp":1772173032073,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009959","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T00:00:00Z","timestamp":1665014400000}}],"reference-count":47,"publisher":"Public Library of Science (PLoS)","issue":"9","license":[{"start":{"date-parts":[[2022,9,26]],"date-time":"2022-09-26T00:00:00Z","timestamp":1664150400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Programma Operativo Nazionale Ricerca e Innovazione","award":["AIM1874325-2"],"award-info":[{"award-number":["AIM1874325-2"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Previous studies for cancer biomarker discovery based on pre-diagnostic blood DNA methylation (DNAm) profiles, either ignore the explicit modeling of the Time To Diagnosis (TTD), or provide inconsistent results. This lack of consistency is likely due to the limitations of standard EWAS approaches, that model the effect of DNAm at CpG sites on TTD independently. In this work, we aim to identify blood DNAm profiles associated with TTD, with the aim to improve the reliability of the results, as well as their biological meaningfulness. We argue that a global approach to estimate CpG sites effect profile should capture the complex (potentially non-linear) relationships interplaying between sites. To prove our concept, we develop a new Deep Learning-based approach assessing the relevance of individual CpG Islands (i.e., assigning a weight to each site) in determining TTD while modeling their combined effect in a survival analysis scenario. The algorithm combines a tailored sampling procedure with DNAm sites agglomeration, deep non-linear survival modeling and SHapley Additive exPlanations (SHAP) values estimation to aid robustness of the derived effects profile. The proposed approach deals with the common complexities arising from epidemiological studies, such as small sample size, noise, and low signal-to-noise ratio of blood-derived DNAm. We apply our approach to a prospective case-control study on breast cancer nested in the EPIC Italy cohort and we perform weighted gene-set enrichment analyses to demonstrate the biological meaningfulness of the obtained results. We compared the results of Deep Survival EWAS with those of a traditional EWAS approach, demonstrating that our method performs better than the standard approach in identifying biologically relevant pathways.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009959","type":"journal-article","created":{"date-parts":[[2022,9,26]],"date-time":"2022-09-26T13:50:51Z","timestamp":1664200251000},"page":"e1009959","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Deep Survival EWAS approach estimating risk profile based on pre-diagnostic DNA methylation: An application to breast cancer time to diagnosis"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5393-8180","authenticated-orcid":true,"given":"Michela Carlotta","family":"Massi","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8398-3852","authenticated-orcid":true,"given":"Lorenzo","family":"Dominoni","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0165-1983","authenticated-orcid":true,"given":"Francesca","family":"Ieva","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7651-5452","authenticated-orcid":true,"given":"Giovanni","family":"Fiorito","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,9,26]]},"reference":[{"issue":"2","key":"pcbi.1009959.ref001","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1677\/erc.0.0080115","article-title":"DNA methylation in breast cancer","volume":"8","author":"X Yang","year":"2001","journal-title":"Endocrine-related cancer"},{"issue":"22","key":"pcbi.1009959.ref002","doi-asserted-by":"crossref","first-page":"4632","DOI":"10.1200\/JCO.2004.07.151","article-title":"DNA methylation and cancer","volume":"22","author":"PM Das","year":"2004","journal-title":"Journal of clinical oncology"},{"issue":"10","key":"pcbi.1009959.ref003","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1080\/15592294.2020.1747748","article-title":"Enrichment of CpG island shore region hypermethylation in epigenetic breast field cancerization","volume":"15","author":"ME Muse","year":"2020","journal-title":"Epigenetics"},{"issue":"1","key":"pcbi.1009959.ref004","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12885-020-07543-4","article-title":"Epigenome-wide DNA methylation and risk of breast cancer: a systematic review","volume":"20","author":"K Ennour-Idrissi","year":"2020","journal-title":"BMC cancer"},{"issue":"1","key":"pcbi.1009959.ref005","first-page":"1","article-title":"DNA methylation-based biological age, genome-wide average DNA methylation, and conventional breast cancer risk factors","volume":"9","author":"M Chen","year":"2019","journal-title":"Scientific Reports"},{"key":"pcbi.1009959.ref006","first-page":"1","article-title":"Pre-diagnostic DNA methylation patterns differ according to mammographic breast density amongst women who subsequently develop breast cancer: a case-only study in the EPIC-Florence cohort","author":"S Caini","year":"2021","journal-title":"Breast Cancer Research and Treatment"},{"issue":"1","key":"pcbi.1009959.ref007","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13058-019-1145-9","article-title":"Blood DNA methylation and breast cancer risk: a meta-analysis of four prospective cohort studies","volume":"21","author":"C Bodelon","year":"2019","journal-title":"Breast Cancer Research"},{"issue":"1-2","key":"pcbi.1009959.ref008","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/15592294.2019.1644879","article-title":"Methodological challenges in constructing DNA methylation risk scores","volume":"15","author":"A H\u00fcls","year":"2020","journal-title":"Epigenetics"},{"issue":"10","key":"pcbi.1009959.ref009","doi-asserted-by":"crossref","first-page":"2026","DOI":"10.1158\/1055-9965.EPI-20-0451","article-title":"Stochastic epigenetic mutations are associated with risk of breast cancer, lung cancer, and mature b-cell neoplasms","volume":"29","author":"A Gagliardi","year":"2020","journal-title":"Cancer Epidemiology and Prevention Biomarkers"},{"issue":"1","key":"pcbi.1009959.ref010","first-page":"1","article-title":"MethylNet: an automated and modular deep learning approach for DNA methylation analysis","volume":"21","author":"JJ Levy","year":"2020","journal-title":"BMC bioinformatics"},{"key":"pcbi.1009959.ref011","doi-asserted-by":"crossref","first-page":"101976","DOI":"10.1016\/j.artmed.2020.101976","article-title":"Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance","volume":"110","author":"L Mac\u00edas-Garc\u00eda","year":"2020","journal-title":"Artificial Intelligence in Medicine"},{"issue":"5","key":"pcbi.1009959.ref012","doi-asserted-by":"crossref","first-page":"e0226461","DOI":"10.1371\/journal.pone.0226461","article-title":"Predicting cancer origins with a DNA methylation-based deep neural network model","volume":"15","author":"C Zheng","year":"2020","journal-title":"PloS one"},{"issue":"8","key":"pcbi.1009959.ref013","doi-asserted-by":"crossref","first-page":"931","DOI":"10.3390\/genes11080931","article-title":"A linear regression and deep learning approach for detecting reliable genetic alterations in cancer using dna methylation and gene expression data","volume":"11","author":"S Mallik","year":"2020","journal-title":"Genes"},{"issue":"17","key":"pcbi.1009959.ref014","doi-asserted-by":"crossref","first-page":"2601","DOI":"10.1093\/bioinformatics\/btab140","article-title":"Integrative survival analysis of breast cancer with gene expression and DNA methylation data","volume":"37","author":"I Bichindaritz","year":"2021","journal-title":"Bioinformatics"},{"key":"pcbi.1009959.ref015","doi-asserted-by":"crossref","unstructured":"Azher ZL, Vaickus LJ, Salas LA, Christensen BC, Levy JJ. Development of biologically interpretable multimodal deep learning model for cancer prognosis prediction. In: Proceedings of the 37th ACM\/SIGAPP Symposium on Applied Computing; 2022. p. 636\u2013644.","DOI":"10.1145\/3477314.3507032"},{"issue":"1","key":"pcbi.1009959.ref016","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13073-021-00930-x","article-title":"DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data","volume":"13","author":"OB Poirion","year":"2021","journal-title":"Genome medicine"},{"issue":"6","key":"pcbi.1009959.ref017","doi-asserted-by":"crossref","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep Learning\u2013Based Multi-Omics Integration Robustly Predicts Survival in Liver CancerUsing Deep Learning to Predict Liver Cancer Prognosis","volume":"24","author":"K Chaudhary","year":"2018","journal-title":"Clinical Cancer Research"},{"issue":"10","key":"pcbi.1009959.ref018","doi-asserted-by":"crossref","first-page":"778","DOI":"10.3390\/genes10100778","article-title":"DNA methylation markers for pan-cancer prediction by deep learning","volume":"10","author":"B Liu","year":"2019","journal-title":"Genes"},{"key":"pcbi.1009959.ref019","unstructured":"Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems; 2017. p. 4768\u20134777."},{"issue":"1","key":"pcbi.1009959.ref020","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/s42256-019-0138-9","article-title":"From local explanations to global understanding with explainable AI for trees","volume":"2","author":"SM Lundberg","year":"2020","journal-title":"Nature machine intelligence"},{"key":"pcbi.1009959.ref021","doi-asserted-by":"crossref","unstructured":"Liu H, Wu X, Zhang S. Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM international conference on Information and knowledge management; 2011. p. 979\u2013984.","DOI":"10.1145\/2063576.2063716"},{"issue":"1","key":"pcbi.1009959.ref022","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-017-11817-6","article-title":"Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models","volume":"7","author":"S Yousefi","year":"2017","journal-title":"Scientific reports"},{"key":"pcbi.1009959.ref023","first-page":"2020","article-title":"MethylSPWNet and MethylCapsNet: Biologically Motivated Organization of DNAm Neural Network, Inspired by Capsule Networks","author":"JJ Levy","year":"2021","journal-title":"bioRxiv"},{"issue":"6","key":"pcbi.1009959.ref024","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"WJ Kent","year":"2002","journal-title":"Genome research"},{"key":"pcbi.1009959.ref025","unstructured":"Liu B, Udell M. Impact of Accuracy on Model Interpretations. arXiv preprint arXiv:201109903. 2020;."},{"issue":"3","key":"pcbi.1009959.ref026","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1515\/sagmb-2014-0077","article-title":"Weighted Kolmogorov Smirnov testing: an alternative for gene set enrichment analysis","volume":"14","author":"K Charmpi","year":"2015","journal-title":"Statistical applications in genetics and molecular biology"},{"issue":"1","key":"pcbi.1009959.ref027","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12885-019-5286-0","article-title":"Human papilloma virus and breast cancer: the role of inflammation and viral expressed proteins","volume":"19","author":"N Khodabandehlou","year":"2019","journal-title":"BMC cancer"},{"issue":"2","key":"pcbi.1009959.ref028","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1159\/000502131","article-title":"Epstein-Barr virus infection and increased sporadic breast carcinoma risk: a meta-analysis","volume":"29","author":"J Su","year":"2020","journal-title":"Medical Principles and Practice"},{"key":"pcbi.1009959.ref029","doi-asserted-by":"crossref","DOI":"10.1155\/2020\/9258396","article-title":"Signal transduction pathways in breast cancer: the important role of PI3K\/Akt\/mTOR","volume":"2020","author":"MA Ortega","year":"2020","journal-title":"Journal of oncology"},{"issue":"4","key":"pcbi.1009959.ref030","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1111\/bph.12486","article-title":"Calcium influx pathways in breast cancer: opportunities for pharmacological intervention","volume":"171","author":"I Azimi","year":"2014","journal-title":"British journal of pharmacology"},{"issue":"1","key":"pcbi.1009959.ref031","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1093\/jnci\/djz065","article-title":"Blood DNA methylation and breast cancer: a prospective case-cohort analysis in the sister study","volume":"112","author":"Z Xu","year":"2020","journal-title":"JNCI: Journal of the National Cancer Institute"},{"issue":"4","key":"pcbi.1009959.ref032","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1016\/j.ajhg.2014.02.011","article-title":"GeMes, Clusters of DNA Methylation under Genetic Control, Can Inform Genetic and Epigenetic Analysis of Disease","volume":"94","author":"Y Liu","year":"2014","journal-title":"The American Journal of Human Genetics"},{"key":"pcbi.1009959.ref033","article-title":"DNA methylation signatures of C-reactive protein associations with structural neuroimaging measures and major depressive disorder","author":"C Green","year":"2020","journal-title":"medRxiv"},{"issue":"1","key":"pcbi.1009959.ref034","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13148-016-0292-4","article-title":"Smoking-associated DNA methylation markers predict lung cancer incidence","volume":"8","author":"Y Zhang","year":"2016","journal-title":"Clinical epigenetics"},{"key":"pcbi.1009959.ref035","doi-asserted-by":"crossref","unstructured":"Cappozzo A, McCrory C, Robinson O, Sterrantino AF, Sacerdote C, Krogh V, et al. A blood DNA methylation biomarker for predicting short-term risk of cardiovascular events. 2022;.","DOI":"10.21203\/rs.3.rs-1689354\/v1"},{"issue":"1","key":"pcbi.1009959.ref036","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-13-86","article-title":"DNA methylation arrays as surrogate measures of cell mixture distribution","volume":"13","author":"EA Houseman","year":"2012","journal-title":"BMC bioinformatics"},{"key":"pcbi.1009959.ref037","doi-asserted-by":"crossref","first-page":"816","DOI":"10.3389\/fgene.2019.00816","article-title":"In epigenomic studies, including cell-type adjustments in regression models can introduce multicollinearity, resulting in apparent reversal of direction of association","author":"SJ Barton","year":"2019","journal-title":"Frontiers in genetics"},{"issue":"1","key":"pcbi.1009959.ref038","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13148-015-0104-2","article-title":"Epigenome-wide association study reveals decreased average methylation levels years before breast cancer diagnosis","volume":"7","author":"K van Veldhoven","year":"2015","journal-title":"Clinical epigenetics"},{"issue":"1","key":"pcbi.1009959.ref039","first-page":"1","article-title":"Don\u2019t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning","volume":"20","author":"JJ Levy","year":"2020","journal-title":"BMC medical research methodology"},{"issue":"7","key":"pcbi.1009959.ref040","doi-asserted-by":"crossref","first-page":"2045","DOI":"10.18632\/aging.101900","article-title":"Socioeconomic position, lifestyle habits and biomarkers of epigenetic aging: a multi-cohort analysis","volume":"11","author":"G Fiorito","year":"2019","journal-title":"Aging (Albany NY)"},{"issue":"1a","key":"pcbi.1009959.ref041","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1079\/PHN2005934","article-title":"The European prospective investigation into cancer and nutrition (EPIC)","volume":"9","author":"CA Gonzalez","year":"2006","journal-title":"Public health nutrition"},{"issue":"6","key":"pcbi.1009959.ref042","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1177\/030089160308900602","article-title":"A molecular epidemiology project on diet and cancer: the EPIC-Italy Prospective Study. Design and baseline characteristics of participants","volume":"89","author":"D Palli","year":"2003","journal-title":"Tumori Journal"},{"issue":"3","key":"pcbi.1009959.ref043","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1093\/biostatistics\/kxq005","article-title":"Redefining CpG islands using hidden Markov models","volume":"11","author":"H Wu","year":"2010","journal-title":"Biostatistics"},{"issue":"1","key":"pcbi.1009959.ref044","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12874-018-0482-1","article-title":"DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network","volume":"18","author":"JL Katzman","year":"2018","journal-title":"BMC medical research methodology"},{"key":"pcbi.1009959.ref045","first-page":"307","volume-title":"Contributions to the Theory of Games","author":"LS Shapley","year":"1953"},{"issue":"18","key":"pcbi.1009959.ref046","doi-asserted-by":"crossref","first-page":"2543","DOI":"10.1001\/jama.1982.03320430047030","article-title":"Evaluating the yield of medical tests","volume":"247","author":"FE Harrell","year":"1982","journal-title":"Jama"},{"key":"pcbi.1009959.ref047","doi-asserted-by":"crossref","unstructured":"Kumar R, Vassilvitskii S. Generalized distances between rankings. In: Proceedings of the 19th international conference on World wide web; 2010. p. 571\u2013580.","DOI":"10.1145\/1772690.1772749"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009959","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T00:00:00Z","timestamp":1665014400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009959","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T13:57:31Z","timestamp":1665064651000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009959"}},"subtitle":[],"editor":[{"given":"Eric F.","family":"Lock","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,9,26]]},"references-count":47,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,9,26]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009959","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.02.25.481911","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,26]]}}}