{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:17:53Z","timestamp":1772173073498,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010328","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T00:00:00Z","timestamp":1658880000000}}],"reference-count":52,"publisher":"Public Library of Science (PLoS)","issue":"7","license":[{"start":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T00:00:00Z","timestamp":1657843200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82173632"],"award-info":[{"award-number":["82173632"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["81903418"],"award-info":[{"award-number":["81903418"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Early Career Research Excellence Award from UoA"},{"DOI":"10.13039\/501100009193","name":"Marsden Fund","doi-asserted-by":"publisher","award":["19-UOA-209"],"award-info":[{"award-number":["19-UOA-209"]}],"id":[{"id":"10.13039\/501100009193","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Applied Basic Research Program of Shanxi Province of China","award":["201801D221399"],"award-info":[{"award-number":["201801D221399"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010328","type":"journal-article","created":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T13:36:34Z","timestamp":1657892194000},"page":"e1010328","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":26,"title":["Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6646-1190","authenticated-orcid":true,"given":"Long","family":"Liu","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9194-8184","authenticated-orcid":true,"given":"Qingyu","family":"Meng","sequence":"additional","affiliation":[]},{"given":"Cherry","family":"Weng","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7943-966X","authenticated-orcid":true,"given":"Qing","family":"Lu","sequence":"additional","affiliation":[]},{"given":"Tong","family":"Wang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0071-5917","authenticated-orcid":true,"given":"Yalu","family":"Wen","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,7,15]]},"reference":[{"issue":"21","key":"pcbi.1010328.ref001","doi-asserted-by":"crossref","first-page":"2119","DOI":"10.1001\/jama.2015.3595","article-title":"The precision medicine initiative: a new national effort","volume":"313","author":"EA Ashley","year":"2015","journal-title":"JAMA"},{"issue":"3","key":"pcbi.1010328.ref002","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1534\/genetics.117.300271","article-title":"Will big data close the missing heritability gap?","volume":"207","author":"H Kim","year":"2017","journal-title":"Genetics"},{"issue":"7","key":"pcbi.1010328.ref003","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1038\/ejhg.2017.50","article-title":"Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study","volume":"25","author":"IM Nolte","year":"2017","journal-title":"Eur J Hum Genet"},{"issue":"9","key":"pcbi.1010328.ref004","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/aje\/kwh101","article-title":"Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker","volume":"159","author":"MS Pepe","year":"2004","journal-title":"Am J Epidemiol"},{"issue":"4","key":"pcbi.1010328.ref005","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1002\/gepi.21966","article-title":"Polygenic epidemiology","volume":"40","author":"F Dudbridge","year":"2016","journal-title":"Genet Epidemiol"},{"key":"pcbi.1010328.ref006","doi-asserted-by":"crossref","first-page":"5415","DOI":"10.1093\/bioinformatics\/btaa1023","article-title":"A Bayesian linear mixed model for prediction of complex traits","volume":"36","author":"Y Hai","year":"2020","journal-title":"Bioinformatics"},{"issue":"6","key":"pcbi.1010328.ref007","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1002\/gepi.22050","article-title":"Polygenic scores via penalized regression on summary statistics","volume":"41","author":"TSH Mak","year":"2017","journal-title":"Genet Epidemiol"},{"issue":"4","key":"pcbi.1010328.ref008","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1089\/cmb.2019.0325","article-title":"Efficient estimation and applications of cross-validated genetic predictions to polygenic risk scores and linear mixed models","volume":"27","author":"J Mefford","year":"2020","journal-title":"J Comput Biol"},{"key":"pcbi.1010328.ref009","doi-asserted-by":"crossref","first-page":"5424","DOI":"10.1093\/bioinformatics\/btaa1029","article-title":"LDpred2: better, faster, stronger","volume":"36","author":"F Prive","year":"2020","journal-title":"Bioinformatics"},{"issue":"9","key":"pcbi.1010328.ref010","doi-asserted-by":"crossref","first-page":"1550","DOI":"10.1101\/gr.169375.113","article-title":"MultiBLUP: improved SNP-based prediction for complex traits","volume":"24","author":"D Speed","year":"2014","journal-title":"Genome Res"},{"issue":"7","key":"pcbi.1010328.ref011","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1101\/gr.201996.115","article-title":"Multikernel linear mixed models for complex phenotype prediction","volume":"26","author":"O Weissbrod","year":"2016","journal-title":"Genome Res"},{"issue":"9","key":"pcbi.1010328.ref012","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.1002\/sim.8477","article-title":"Multikernel linear mixed model with adaptive lasso for complex phenotype prediction","volume":"39","author":"Y Wen","year":"2020","journal-title":"Stat Med"},{"issue":"5","key":"pcbi.1010328.ref013","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1016\/j.ajhg.2020.03.013","article-title":"Accurate and scalable construction of polygenic scores in large biobank data sets","volume":"106","author":"S Yang","year":"2020","journal-title":"Am J Hum Genet"},{"issue":"7256","key":"pcbi.1010328.ref014","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1038\/nature08185","article-title":"Common polygenic variation contributes to risk of schizophrenia and bipolar disorder","volume":"460","author":"C International Schizophrenia","year":"2009","journal-title":"Nature"},{"issue":"1","key":"pcbi.1010328.ref015","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1038\/s41467-017-00470-2","article-title":"Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models","volume":"8","author":"P Zeng","year":"2017","journal-title":"Nat Commun"},{"issue":"4-5","key":"pcbi.1010328.ref016","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1177\/1471082X17698255","article-title":"Statistical contributions to Bioinformatics: design, modelling, structure learning and integration","volume":"17","author":"JS Morris","year":"2017","journal-title":"Stat Model"},{"issue":"6","key":"pcbi.1010328.ref017","doi-asserted-by":"crossref","first-page":"1785","DOI":"10.1093\/bioinformatics\/btz822","article-title":"Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data","volume":"36","author":"J Li","year":"2020","journal-title":"Bioinformatics"},{"issue":"7","key":"pcbi.1010328.ref018","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"G Eraslan","year":"2019","journal-title":"Nat Rev Genet"},{"issue":"1","key":"pcbi.1010328.ref019","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1038\/s41588-018-0295-5","article-title":"A primer on deep learning in genomics","volume":"51","author":"J Zou","year":"2019","journal-title":"Nat Genet"},{"issue":"26","key":"pcbi.1010328.ref020","doi-asserted-by":"crossref","first-page":"3764","DOI":"10.1002\/sim.7832","article-title":"Genetic risk prediction using a spatial autoregressive model with adaptive lasso","volume":"37","author":"Y Wen","year":"2018","journal-title":"Stat Med"},{"issue":"5","key":"pcbi.1010328.ref021","doi-asserted-by":"crossref","first-page":"2055","DOI":"10.1214\/15-AOS1337","article-title":"Controlling the false discovery rate via knockoffs","volume":"43","author":"RF Barber","year":"2015","journal-title":"Ann Statist"},{"issue":"3","key":"pcbi.1010328.ref022","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1111\/rssb.12265","article-title":"Panning for gold: \u2018model-X\u2019 knockoffs for high dimensional controlled variable selection","volume":"80","author":"E Cand\u00e8s","year":"2018","journal-title":"J R Stat Soc B"},{"key":"pcbi.1010328.ref023","unstructured":"Lu Y, Fan Y, Lv J, Stafford Noble W. DeepPINK: reproducible feature selection in deep neural networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors. Adv Neural Inf Process Syst. vol. 31. Curran Associates, Inc.; 2018.Available from: https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/29daf9442f3c0b60642b14c081b4a556-Paper.pdf."},{"issue":"3","key":"pcbi.1010328.ref024","doi-asserted-by":"crossref","first-page":"1409","DOI":"10.1214\/19-AOS1852","article-title":"Robust inference with knockoffs","volume":"48","author":"RF Barber","year":"2020","journal-title":"Ann Statist"},{"issue":"5","key":"pcbi.1010328.ref025","doi-asserted-by":"crossref","first-page":"3021","DOI":"10.1214\/19-AOS1920","article-title":"Relaxing the assumptions of knockoffs by conditioning","volume":"48","author":"D Huang","year":"2020","journal-title":"Ann Statist"},{"issue":"532","key":"pcbi.1010328.ref026","doi-asserted-by":"crossref","first-page":"1861","DOI":"10.1080\/01621459.2019.1660174","article-title":"Deep knockoffs","volume":"115","author":"Y Romano","year":"2020","journal-title":"J Am Stat Assoc"},{"key":"pcbi.1010328.ref027","doi-asserted-by":"crossref","unstructured":"Xing X, Gui Y, Dai C, Liu JS. NGM: Neural Gaussian Mirror for Controlled Feature Selection in Neural Networks. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA); 2020. p. 148\u2013152.","DOI":"10.1109\/ICMLA51294.2020.00032"},{"key":"pcbi.1010328.ref028","unstructured":"Dai C, Lin B, Xing X, Liu JS. False discovery rate control via data splitting; 2020."},{"key":"pcbi.1010328.ref029","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.neunet.2020.12.004","article-title":"Deep-gKnock: nonlinear group-feature selection with deep neural networks","volume":"135","author":"G Zhu","year":"2021","journal-title":"Neural Networks"},{"issue":"3","key":"pcbi.1010328.ref030","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/j.jalz.2010.03.013","article-title":"Alzheimer\u2019s Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans","volume":"6","author":"AJ Saykin","year":"2010","journal-title":"Alzheimers Dement"},{"issue":"5","key":"pcbi.1010328.ref031","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1111\/j.1467-9868.2008.00674.x","article-title":"Sure independence screening for ultrahigh dimensional feature space","volume":"70","author":"J Fan","year":"2008","journal-title":"J R Stat Soc B"},{"issue":"14","key":"pcbi.1010328.ref032","doi-asserted-by":"crossref","first-page":"i427","DOI":"10.1093\/bioinformatics\/btz333","article-title":"Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data","volume":"35","author":"H Climente-Gonz\u00e1lez","year":"2019","journal-title":"Bioinformatics"},{"issue":"7","key":"pcbi.1010328.ref033","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1038\/ng.608","article-title":"Common SNPs explain a large proportion of the heritability for human height","volume":"42","author":"J Yang","year":"2010","journal-title":"Nat Genet"},{"issue":"2","key":"pcbi.1010328.ref034","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pgen.1003264","article-title":"Polygenic modeling with bayesian sparse linear mixed models","volume":"9","author":"X Zhou","year":"2013","journal-title":"PLOS Genetics"},{"issue":"4","key":"pcbi.1010328.ref035","doi-asserted-by":"crossref","first-page":"762","DOI":"10.1093\/biostatistics\/kxs014","article-title":"Optimal tests for rare variant effects in sequencing association studies","volume":"13","author":"S Lee","year":"2012","journal-title":"Biostatistics"},{"issue":"2","key":"pcbi.1010328.ref036","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1016\/j.ajhg.2012.06.007","article-title":"Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies","volume":"91","author":"S Lee","year":"2012","journal-title":"Am J Hum Genet"},{"issue":"3","key":"pcbi.1010328.ref037","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1016\/j.ajhg.2019.01.002","article-title":"ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies","volume":"104","author":"Y Liu","year":"2019","journal-title":"Am J Hum Genet"},{"issue":"4","key":"pcbi.1010328.ref038","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1212\/WNL.0b013e31827f0889","article-title":"Differential effect of APOE genotype on amyloid load and glucose metabolism in AD dementia","volume":"80","author":"R Ossenkoppele","year":"2013","journal-title":"Neurology"},{"issue":"5","key":"pcbi.1010328.ref039","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1001\/archneurol.2010.88","article-title":"An inherited variable poly-T repeat genotype in TOMM40 in Alzheimer\u2019s disease","volume":"67","author":"AD Roses","year":"2010","journal-title":"Arch Neurol-chicago"},{"issue":"1","key":"pcbi.1010328.ref040","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1038\/ng0594-74","article-title":"Apolipoprotein E4 allele in a population\u2013based study of early\u2013onset Alzheimer\u2019s disease","volume":"7","author":"CM van Duijn","year":"1994","journal-title":"Nat Genet"},{"issue":"1","key":"pcbi.1010328.ref041","doi-asserted-by":"crossref","first-page":"e87017","DOI":"10.1371\/journal.pone.0087017","article-title":"Association between APOC1 polymorphism and Alzheimer\u2019s disease: a case-control study and meta-analysis","volume":"9","author":"Q Zhou","year":"2014","journal-title":"PloS one"},{"key":"pcbi.1010328.ref042","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.neulet.2016.05.050","article-title":"The TOMM40 gene rs2075650 polymorphism contributes to Alzheimer\u2019s disease in Caucasian, and Asian populations","volume":"628","author":"H Huang","year":"2016","journal-title":"Neurosci Lett"},{"issue":"10","key":"pcbi.1010328.ref043","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1001\/jama.279.10.751","article-title":"The APOE \u03f54 allele and the risk of Alzheimer\u2019s disease among African Americans, whites, and Hispanics","volume":"279","author":"MX Tang","year":"1998","journal-title":"Jama"},{"issue":"4","key":"pcbi.1010328.ref044","doi-asserted-by":"crossref","first-page":"594","DOI":"10.1001\/archneur.59.4.594","article-title":"Association between apolipoprotein E genotype and Alzheimer\u2019s disease in African American subjects","volume":"59","author":"NR Graff-Radford","year":"2002","journal-title":"Arch Neurol-chicago"},{"issue":"81","key":"pcbi.1010328.ref045","doi-asserted-by":"crossref","first-page":"35207","DOI":"10.18632\/oncotarget.26184","article-title":"Biothiols and oxidative stress markers and polymorphisms of TOMM40 and APOC1 genes in Alzheimer\u2019s disease patients","volume":"9","author":"M Prendecki","year":"2018","journal-title":"Oncotarget"},{"key":"pcbi.1010328.ref046","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv e-prints. 2018; p. arXiv:1810.04805."},{"issue":"6","key":"pcbi.1010328.ref047","doi-asserted-by":"crossref","first-page":"653","DOI":"10.4103\/1673-5374.130117","article-title":"APOE and APOC1 gene polymorphisms are associated with cognitive impairment progression in Chinese patients with late-onset Alzheimer\u2019s disease","volume":"9","author":"Q Zhou","year":"2014","journal-title":"Neural Regener Res"},{"key":"pcbi.1010328.ref048","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.neulet.2016.05.050","article-title":"The TOMM40 gene rs2075650 polymorphism contributes to Alzheimer\u2019s disease in Caucasian, and Asian populations","volume":"628","author":"H Huang","year":"2016","journal-title":"Neurosci Lett"},{"issue":"5","key":"pcbi.1010328.ref049","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1038\/tpj.2009.69","article-title":"A TOMM40 variable-length polymorphism predicts the age of late-onset Alzheimer\u2019s disease","volume":"10","author":"AD Roses","year":"2010","journal-title":"Pharmacogenomics J"},{"issue":"9","key":"pcbi.1010328.ref050","doi-asserted-by":"crossref","first-page":"1156","DOI":"10.1038\/nn.3786","article-title":"Alzheimer\u2019s disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci","volume":"17","author":"PL De Jager","year":"2014","journal-title":"Nature neuroscience"},{"issue":"4","key":"pcbi.1010328.ref051","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1097\/WAD.0000000000000142","article-title":"Association Analysis of Polymorphisms in TOMM40, CR1, PVRL2, SORL1, PICALM, and 14q32.13 Regions in Colombian Alzheimer Disease Patients","volume":"30","author":"J Ortega-Rojas","year":"2016","journal-title":"Alzheimer Dis Assoc Disord"},{"key":"pcbi.1010328.ref052","unstructured":"Molchanov D, Ashukha A, Vetrov D. Variational Dropout Sparsifies Deep Neural Networks. In: Proceedings of the 34th International Conference on Machine Learning\u2014Volume 70. ICML\u201917. JMLR.org; 2017. p. 2498\u20132507."}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010328","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T00:00:00Z","timestamp":1658880000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010328","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,11]],"date-time":"2023-02-11T13:27:36Z","timestamp":1676122056000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010328"}},"subtitle":[],"editor":[{"given":"Wei","family":"Li","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,15]]},"references-count":52,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7,15]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010328","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.01.27.22269862","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,15]]}}}