{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T20:13:53Z","timestamp":1762460033135,"version":"3.41.2"},"reference-count":61,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,10,13]],"date-time":"2021-10-13T00:00:00Z","timestamp":1634083200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2019YFA0709501"],"award-info":[{"award-number":["2019YFA0709501"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["XDPB17"],"award-info":[{"award-number":["XDPB17"]}]},{"name":"Key-Area Research and Development of Guangdong Province","award":["2020B1111190001"],"award-info":[{"award-number":["2020B1111190001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61621003"],"award-info":[{"award-number":["61621003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,17]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework\u2014CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http:\/\/zhanglabtools.org\/CASAVA) has been built to facilitate easily access to CASAVA scores.<\/jats:p>","DOI":"10.1093\/bib\/bbab438","type":"journal-article","created":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T19:36:35Z","timestamp":1633030595000},"source":"Crossref","is-referenced-by-count":8,"title":["Disease category-specific annotation of variants using an ensemble learning framework"],"prefix":"10.1093","volume":"23","author":[{"given":"Zhen","family":"Cao","sequence":"first","affiliation":[{"name":"NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Yanting","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Emory University, Atlanta, GA 30322, USA"}]},{"given":"Ran","family":"Duan","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, Yunnan University, Kunming 650500, China"}]},{"given":"Peng","family":"Jin","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1583-146X","authenticated-orcid":false,"given":"Zhaohui S","family":"Qin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Emory University, Atlanta, GA 30322, USA"},{"name":"Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0192-7118","authenticated-orcid":false,"given":"Shihua","family":"Zhang","sequence":"additional","affiliation":[{"name":"NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China"},{"name":"Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China"}]}],"member":"286","published-online":{"date-parts":[[2021,10,13]]},"reference":[{"key":"2022011921341156000_ref1","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1038\/nature13127","article-title":"Guidelines for investigating causality of sequence variants in human disease","volume":"508","author":"MacArthur","year":"2014","journal-title":"Nature"},{"key":"2022011921341156000_ref2","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/nrg2779","article-title":"Uncovering the roles of rare variants in common disease through whole-genome sequencing","volume":"11","author":"Cirulli","year":"2010","journal-title":"Nat Rev Genet"},{"key":"2022011921341156000_ref3","doi-asserted-by":"crossref","first-page":"D1001","DOI":"10.1093\/nar\/gkt1229","article-title":"The NHGRI GWAS Catalog, a curated resource of SNP-trait associations","volume":"42","author":"Welter","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref4","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1177\/1535370217713750","article-title":"Challenges and progress in interpretation of non-coding genetic variants associated with human disease","volume":"242","author":"Zhu","year":"2017","journal-title":"Exp Biol Med"},{"key":"2022011921341156000_ref5","doi-asserted-by":"crossref","first-page":"R102","DOI":"10.1093\/hmg\/ddv259","article-title":"Non-coding genetic variants in human disease","volume":"24","author":"Zhang","year":"2015","journal-title":"Hum Mol Genet"},{"key":"2022011921341156000_ref6","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The ENCODE (ENCyclopedia of DNA elements) project","volume":"306","author":"ENCODE Project Consortium","year":"2004","journal-title":"Science"},{"key":"2022011921341156000_ref7","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1038\/nbt1010-1045","article-title":"The NIH roadmap epigenomics mapping consortium","volume":"28","author":"Bernstein","year":"2010","journal-title":"Nat Biotechnol"},{"key":"2022011921341156000_ref8","first-page":"1639\u201354","article-title":"Regulatory variants: from detection to predicting impact","volume":"20","author":"Rojano","year":"2018","journal-title":"Brief Bioinform"},{"key":"2022011921341156000_ref9","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref10","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1093\/bioinformatics\/btu703","article-title":"DANN: a deep learning approach for annotating the pathogenicity of genetic variants","volume":"31","author":"Quang","year":"2015","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref11","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/nmeth.2832","article-title":"Functional annotation of noncoding sequence variants","volume":"11","author":"Ritchie","year":"2014","journal-title":"Nat Methods"},{"key":"2022011921341156000_ref12","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.1093\/bioinformatics\/btv009","article-title":"An integrative approach to predicting the functional effects of non-coding and coding sequence variation","volume":"31","author":"Shihab","year":"2015","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref13","doi-asserted-by":"crossref","first-page":"10576","DOI":"10.1038\/srep10576","article-title":"A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data","volume":"5","author":"Lu","year":"2015","journal-title":"Sci Rep"},{"key":"2022011921341156000_ref14","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1038\/ng.3331","article-title":"A method to predict the impact of regulatory variants from DNA sequence","volume":"47","author":"Lee","year":"2015","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref15","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning\u2013based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat Methods"},{"key":"2022011921341156000_ref16","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1038\/ng.3477","article-title":"A spectral approach integrating functional genomic annotations for coding and noncoding variants","volume":"48","author":"Ionita-Laza","year":"2016","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref17","doi-asserted-by":"crossref","first-page":"2729","DOI":"10.1093\/bioinformatics\/btw288","article-title":"Predicting regulatory variants with composite statistic","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref18","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1186\/s13059-016-1112-z","article-title":"DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles","volume":"17","author":"Chen","year":"2016","journal-title":"Genome Biol"},{"key":"2022011921341156000_ref19","doi-asserted-by":"crossref","first-page":"618","DOI":"10.1038\/ng.3810","article-title":"Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data","volume":"49","author":"Huang","year":"2017","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref20","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1038\/s41467-018-03133-y","article-title":"Identifying noncoding risk variants using disease-relevant gene regulatory networks","volume":"9","author":"Gao","year":"2018","journal-title":"Nat Commun"},{"key":"2022011921341156000_ref21","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1038\/s41588-018-0160-6","article-title":"Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk","volume":"50","author":"Zhou","year":"2018","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref22","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1186\/s13073-018-0565-y","article-title":"Prioritization and functional assessment of noncoding variants associated with complex diseases","volume":"10","author":"Zhou","year":"2018","journal-title":"Genome Med"},{"key":"2022011921341156000_ref23","doi-asserted-by":"crossref","first-page":"1573","DOI":"10.1093\/bioinformatics\/bty872","article-title":"TIVAN: tissue-specific cis-eQTL single nucleotide variant annotation and prediction","volume":"35","author":"Chen","year":"2019","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref24","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1038\/ejhg.2013.96","article-title":"Phenotype\u2013genotype integrator (PheGenI): synthesizing genome-wide association study (GWAS) data with existing genomic resources","volume":"22","author":"Ramos","year":"2014","journal-title":"Eur J Hum Genet"},{"key":"2022011921341156000_ref25","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1136\/jamia.2001.0080317","article-title":"Medical subject headings used to search the biomedical literature","volume":"8","author":"Coletti","year":"2001","journal-title":"J Am Med Inform Assoc"},{"key":"2022011921341156000_ref26","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2022011921341156000_ref27","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1109\/TSMCB.2008.2007853","article-title":"Exploratory undersampling for class-imbalance learning","volume":"39","author":"Liu","year":"2009","journal-title":"IEEE Trans Syst Man Cybern B Cybern"},{"key":"2022011921341156000_ref28","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chen","year":"2016"},{"key":"2022011921341156000_ref29","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s13059-016-0974-4","article-title":"The ensembl variant effect predictor","volume":"17","author":"Ahn","year":"2016","journal-title":"Genome Biol"},{"key":"2022011921341156000_ref30","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1038\/nature12787","article-title":"An atlas of active enhancers across human cell types and tissues","volume":"507","author":"Andersson","year":"2014","journal-title":"Nature"},{"key":"2022011921341156000_ref31","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2009","journal-title":"IEEE T Knowl Data En"},{"key":"2022011921341156000_ref32","first-page":"233","volume-title":"Proceedings of the 23rd International Conference on Machine learning","author":"Avis","year":"2006"},{"key":"2022011921341156000_ref33","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PloS one"},{"key":"2022011921341156000_ref34","doi-asserted-by":"crossref","first-page":"3940","DOI":"10.1093\/bioinformatics\/bti623","article-title":"ROCR: visualizing classifier performance in R","volume":"21","author":"Sing","year":"2005","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref35","doi-asserted-by":"crossref","first-page":"W109","DOI":"10.1093\/nar\/gky399","article-title":"SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine","volume":"46","author":"Dayem Ullah","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref36","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1038\/ng0504-431","article-title":"The genetic association database","volume":"36","author":"Becker","year":"2004","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref37","doi-asserted-by":"crossref","first-page":"D805","DOI":"10.1093\/nar\/gku1075","article-title":"COSMIC: exploring the world's knowledge of somatic mutations in human cancer","volume":"43","author":"Forbes","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref38","doi-asserted-by":"crossref","first-page":"D980","DOI":"10.1093\/nar\/gkt1113","article-title":"ClinVar: public archive of relationships among sequence variation and human phenotype","volume":"42","author":"Landrum","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref39","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1126\/science.1222794","article-title":"Systematic localization of common disease-associated variation in regulatory DNA","volume":"337","author":"Maurano","year":"2012","journal-title":"Science"},{"key":"2022011921341156000_ref40","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1038\/ng1007-1181","article-title":"The NCBI dbGaP database of genotypes and phenotypes","volume":"39","author":"Mailman","year":"2007","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref41","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1038\/ng.3359","article-title":"Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations","volume":"47","author":"Liu","year":"2015","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref42","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.schres.2012.10.010","article-title":"Expression of autism spectrum and schizophrenia in patients with a 22q11. 2 deletion","volume":"143","author":"Vorstman","year":"2013","journal-title":"Schizophr Res"},{"key":"2022011921341156000_ref43","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1136\/ard.2006.059428","article-title":"Role of the MHC2TA gene in autoimmune diseases","volume":"66","author":"Mart\u00ednez","year":"2007","journal-title":"Ann Rheum Dis"},{"key":"2022011921341156000_ref44","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/ng1544","article-title":"MHC2TA is associated with differential MHC molecule expression and susceptibility to rheumatoid arthritis, multiple sclerosis and myocardial infarction","volume":"37","author":"Swanberg","year":"2005","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref45","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1136\/ard.2006.063347","article-title":"MHC2TA is associated with rheumatoid arthritis in Japanese patients","volume":"66","author":"Iikuni","year":"2007","journal-title":"Ann Rheum Dis"},{"key":"2022011921341156000_ref46","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1038\/ng.472","article-title":"Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus","volume":"41","author":"Han","year":"2009","journal-title":"Nat Genet"},{"key":"2022011921341156000_ref47","doi-asserted-by":"crossref","first-page":"4017","DOI":"10.1182\/blood-2014-12-580068","article-title":"CD19-targeted chimeric antigen receptor T-cell therapy for acute lymphoblastic leukemia","volume":"125","author":"Maude","year":"2015","journal-title":"Blood"},{"key":"2022011921341156000_ref48","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/nmeth.1315","article-title":"mRNA-Seq whole-transcriptome analysis of a single cell","volume":"6","author":"Tang","year":"2009","journal-title":"Nat Methods"},{"key":"2022011921341156000_ref49","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/nature14590","article-title":"Single-cell chromatin accessibility reveals principles of regulatory variation","volume":"523","author":"Buenrostro","year":"2015","journal-title":"Nature"},{"key":"2022011921341156000_ref50","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.ymeth.2012.05.001","article-title":"Hi-C: a comprehensive technique to capture the conformation of genomes","volume":"58","author":"Belton","year":"2012","journal-title":"Methods"},{"key":"2022011921341156000_ref51","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nmeth.1906","article-title":"ChromHMM: automating chromatin-state discovery and characterization","volume":"9","author":"Ernst","year":"2012","journal-title":"Nat Methods"},{"key":"2022011921341156000_ref52","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1038\/nmeth.1937","article-title":"Unsupervised pattern discovery in human chromatin structure through genomic segmentation","volume":"9","author":"Hoffman","year":"2012","journal-title":"Nat Methods"},{"key":"2022011921341156000_ref53","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1038\/s41586-020-2559-3","article-title":"Index and biological spectrum of human DNase I hypersensitive sites","volume":"584","author":"Meuleman","year":"2020","journal-title":"Nature"},{"key":"2022011921341156000_ref54","doi-asserted-by":"crossref","first-page":"9230","DOI":"10.1093\/nar\/gkt712","article-title":"Discovery of cell-type specific regulatory elements in the human genome using differential chromatin modification analysis","volume":"41","author":"Chen","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref55","doi-asserted-by":"crossref","first-page":"9823","DOI":"10.1093\/nar\/gkx659","article-title":"Accurate and reproducible functional maps in 127 human cell types via 2D genome segmentation","volume":"45","author":"Zhang","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref56","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1093\/bioinformatics\/btt012","article-title":"Sparsely correlated hidden Markov models with application to genome-wide location studies","volume":"29","author":"Choi","year":"2013","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref57","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-14-S18-S1","article-title":"Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool","volume":"14","author":"Chen","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2022011921341156000_ref58","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.1630","article-title":"GREAT improves functional interpretation of cis-regulatory regions","volume":"28","author":"McLean","year":"2010","journal-title":"Nat Biotechnol"},{"key":"2022011921341156000_ref59","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1093\/bioinformatics\/btz669","article-title":"Regulatory annotation of genomic intervals based on tissue-specific expression QTLs","volume":"36","author":"Xu","year":"2020","journal-title":"Bioinformatics"},{"key":"2022011921341156000_ref60","doi-asserted-by":"crossref","first-page":"940","DOI":"10.1093\/nar\/gkr972","article-title":"Disease ontology: a backbone for disease semantic integration","volume":"40","author":"Schriml","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2022011921341156000_ref61","first-page":"00582","article-title":"Information-theoretic classification accuracy: a criterion that guides data-driven combination of ambiguous outcome labels in multi-class classification","volume":"2109","author":"Zhang","year":"2021","journal-title":"Preprint arXiv arXiv"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab438\/42230860\/bbab438.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab438\/42230860\/bbab438.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T19:34:27Z","timestamp":1699558467000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab438\/6394995"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,13]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab438","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2022,1]]},"published":{"date-parts":[[2021,10,13]]},"article-number":"bbab438"}}